AD-A201  042 


eo«iT» 


UNCLASSIFIED  fllTR  TO  T  pnQV 

i-»<c»T,0N  o'  v»i*  »Aod*4JW««k4Hi  I  Am  . 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

RCPORt NUM*I* - 

AIM  1070 

1.  RECIPIENT'S  CATALOG  NUMBER 

Title  fan*  Moot) 

An  Operating  Environment  for  the 

Jellybean  Machine 

S.  TYRE  OF  REPORT  t  PCRtOO  COVERED 

memorandum 

AUTHOR!*! 

Brian  K.  To tty 

t.  contract  on  orant  numbcrm; 

U00014-80-C-C622 

Artificial  Intelligence  Laboratory 

545  Technology  Square 

Cambridge,  MA  02139 

IS.  PROGRAM  ELEMENT.  PROJECT,  TASK 
AREA  A  WORK  UNIT  NUMBERS 

Advanced  Research  Projects  Agency 

May  1988 

1400  Wilson  Blvd. 

Arlington,  VA  22209 

156 

t«.  momiTONINO  AOCHCT  NAMC  4  AOORtSSfU  «lhn*l  4m  ChwIIHi §  OfflMJ 

Office  of  Naval  Research 

Information  Systems 

•4.  SECURITY  CLASS.  1*1  Nil*  npmrl) 

UNCLASSIFIED 

Arlington,  VA  22217 

it*.  pCCL  ASSiriC  ATION/  OOWNORAOINO 
SCHEDULE 

l«.  DISTRIBUTION  STATEMENT  |*l  Nil*  RmMJ 

Distribution  is  unlimited 

DT1C 

J^ELECTErt 

^OECO11908l| 

17-  DISTRIBUTION  STATEMENT  (ml  f *•  MtarN  N  IImI  JE,  If  BINtmI  Atom  MmpmH)  Ajft 

w  <V 

Unlimited  E» 

None 

It.  K IV  NONOt  fCwllMH  «n  IM«M  II  MNiwr  SR*  NmIUt  *T  MmI  >—4  tr) 

>^perating  systems  ,  distributed^  systems  . 

f  jellybean  machine  .  networks  ^ _ 

parallel  processing  .  virtual  memory  ,  ^  ''t'f  ■  '  -  '  -  31:  “  - 

ensemble  machines  r  .v  f  ***)»■*.>■  5- 

see  back  of  page 

DO  .XT*.  1473 


COITION  or  I  NOV  ••  IS  OBSOLETE 
*/N  •!•>••!«•  ((91  I 


UNCLASSIFIED 

SECURITY  CL  ASM  TIC  AT  ION  or  THIS  PABE  r^MN  KnInnn 


* 


Block  20  cont'd 


The  Jellybean  Machine  is  a  scalable  MIMD  coacnrtent  processor  consisting  of  special-purpose  RISC  pro¬ 
cessors  loosely  coupled  into  a  low  latency  network.  The  problem  with  snch  a  machine  is  to  find  a  way  to 
efficiently  coordinate  the  collect! re  power  of  the  distributed  processing  elements.  A  foundation  of  efficient, 
powerful  services  is  required  to  support  this  system. 

To  provide  this  supportive  operating  environment,  I  developed  an  operating  system  kernel  that  serves 
many  of  the  initial  needs  of  our  machine.  This  Jellybean  Operating  System  Software  provides  an  object- 
based  storage  model,  where  typed  contiguous  Mods  act  as  the  basic  metric  of  storage.  This  memory  model 
is  complemented  by  a  global  virtual  naming  scheme  that  can  reference  objects  residing  on  any  node  of  the 
network.  Migration  mechanisms  allow  object  relocation  among  different  nodes,  and  permit  local  caching  of 
code.  A  low  cost  process  control  system  based  on  fast-allocated  contexts  allows  parallelism  at  a  significantly 
fine  grain  (on  the  order  of  30  instructions  per  task). 

The  system  services  are  developed  in  detail,  and  may  be  of  interest  to  other  designers  of  fine  grain, 
distributed  memory  processing  networks.  The  initial  performance  estimates  are  satisfactory.  Optimizations 
will  require  more  insight  into  how  the  machine  will  perform  under  real-world  conditions. 
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Chapter  1 


Introduction 


I  am  the  people  —  the  mob  —  the  crowd  —  the  mass 
Do  you  know  that  all  the  great  work  of  the  world  is  done  through  me? 

—  Carl  Sandburg,  in  I  Am  the  People,  the  Mob  (1916) 

Power  is  the  great  aphrodisiac. 
—  in  The  New  York  Times  (January  19,  1971) 

Concurrent  processing  is  becoming  a  progressively  more  popular  field  in  computer 
science.  The  vision  of  harnessing  previously  undreamt  of  computational  power  at  a  reason¬ 
able  cost  is  leading  the  drive.  By  connecting  many  moderately  powerful  microprocesors  in  a 
communications  medium,  system  designers  hope  to  be  able  to  take  advantage  of  the  collec¬ 
tive  power  of  the  architecture  to  solve  tasks  that  were  previously  time  or  cost-prohibitive. 

Unfortunately,  the  eager  concurrent  system  designer  soon  finds  that  many  issues 
are  still  unresolved.  Though  people  have  a  fairly  good  grasp  of  ways  to  build  successful 
sequential  machines,  it  is  less  dear  how  to  build  optimal,  or  even  acceptable  concurrent 
systems.  The  designer  is  soon  faced  by  a  barrage  of  questions  that  are  difficult  to  answer. 
“What  grain  of  parallelism  should  be  supported?”  “What  level  of  functionality  should  the 
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processors  provide?”  “How  should  the  processors  communicate?”  “How  tightly  coupled 
should  the  processors  be?”  “How  should  memory  be  managed?”  “How  should  the  load  be 
distributed?”.  Many  research  groups  are  attempting  to  answer  these  questions  at  this  very 
moment. 

Some  insight  into  concurrent  architectures  has  been  gained  over  the  years,  and 
the  current  directions  of  research  reflects  the  knowledge  gained.  Multicomputer  networks 
(sometimes  called  “ensemble  machines” )  are  one  direction  that  concurrent  systems  research 
has  taken.  This  genre  of  machine  connects  relatively  conventional  microprocessors  via  an 
automatically  routed  network.  The  design  is  advantageous  because  it  takes  advantage  of  well 
understood  sequential  processor  technology  for  the  processing  nodes,  and  the  performance  of 
the  system  can  grow  proportionately  with  the  number  of  processors1,  providing  scalability. 

For  the  past  two  years,  the  Concurrent  VLSI  Architecture  Group  at  M.I.T.  has  been 
designing  a  concurrent  processing  network,  christened  the  Jellybean  Machine,  under  the 
direction  of  Professor  William  Dally  [Dal86c].  The  goal  of  the  Jellybean  Machine  project  is 
to  design  a  scalable  concurrent  processor  out  of  low-priced  (jellybean)  parts,  that  efficiently 
supports  an  object-oriented  execution  model.  The  processor  is  targeted  at  both  symbolic 
and  numeric  applications,  and  will  be  programmed  in  high-level,  object-oriented  languages. 
It  hopefully  will  serve  as  a  succesful  example  and  a  test  bed  for  advanced  concurrent  systems 
research. 


1.1  Scope  of  Thesis 

This  thesis  rep„.t  describes  the  design  and  implementation  of  an  operating  system  prototype 
for  the  J- Machine.  The  operating  system  was  required  to  support  a  global  namespace  across 
the  distributed  processors,  allocate  memory  in  an  object-based  storage  model,  support 
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inter-processor  communication,  provide  system  services  to  control  code  execution,  object 
migration,  and  an  object-oriented  calling  model.  It  also  provided  a  perch  from  which  more 
advanced  issues  in  system  design  could  be  studied. 


1.2  Highlights  of  Contributions 


In  the  course  of  the  design  of  the  J-Machine  operating  system,  several  ideas  were  developed 
that  may  be  of  special  interest  to  the  designer  of  multicomputer  networks. 

•  In  section  3.4,  I  describe  a  virtual  addressing  system  that  resolves  objects  names 
across  distributed  nodes  by  a  mechanism  known  as  hometown  addressing.  This  scheme 
delegates  to  object  birthnodes  the  responsibility  for  knowing  current  object  residences, 
permitting  object  migration.  An  accompanying  mechanism  of  “hints”  is  provided  to 
improve  performance. 

•  To  simplify  the  hardware  with  minimal  cost  in  flexibility,  we  have  developed  an  ex¬ 
plicit,  one  time  virtual  translation  scheme  via  the  XLATE  machine  instruction,  that 
converts  a  virtual  address  to  a  physical  one.  Retranslation  is  provided  for  automati¬ 
cally  by  fault  handlers. 

•  Chapter  5  describes  a  low  overhead  code  execution  model  that  supports  inexpensive 
remote  procedure  calls,  local  caching  of  code,  and  convenient  suspension  and  resump¬ 
tion  of  processes. 

•  Section  5.4  describes  a  system  for  fast  context  creation  that  involves  the  re-use  of  old 
context  objects.  This  is  an  important  optimization  based  on  the  short  life  and  rapid 
freqency  of  context  allocation. 

•  Section  5.6  outlines  a  simple  and  fast,  resource  distribution  mechanism  that  limits 
bottlenecks  and  cross  network  traffic  by  dynamically  creating  a  type  distribution  tree 
for  the  resource. 
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1.3  A  Closer  Look  At  The  Jellybean  Machine 


The  J- Machine  is  composed  of  many  custom  RISC  microprocessors  called  Message- Driven 
Processors  or  MDPs.  These  processing  elements  have  small,  local  memories  and  are  con¬ 
nected  in  a  loosely  coupled  network.  Inter-node  communication  is  provided  via  message 
sends  that  are  automatically  routed  to  the  proper  destination  nodes.  A  virtual  object- 
based  memory  abstraction  is  built  over  the  distributed  nodes  providing  a  uniform  global 
namespace.  Various  levels  of  low-cost  execution  control  provide  a  reasonably  line  grain 
of  concurrency  (on  the  level  of  30  instruction  procedures).  An  object-oriented  execution 
model  is  built  upon  this  fine-grain  execution  model.  The  rest  of  the  system  implements 
miscellaneous  system  services  and  mechanisms  to  improve  performance. 
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1.4  Background 

Concurrent  architecture  design  has  been  seriously  studied  for  at  leaat  the  past  fifteen  years, 
but  there  is  still  much  to  be  learned.  The  various  visions  of  machines,  operating  systems, 
and  target  applications  are  so  diverse,  that  few  definitive  statements  can  be  made. 

We  see  SIMD  parallelism,  promoted  by  vector  operations  as  seen  in  the  Cray.  More 
complicated  architectures  like  the  Connection  Machine  [Hil85],  and  systolic  array  processors 
like  the  Warp  [Kun82j  are  alternative  approaches,  providing  fine-grain  concurrency  with 
repetitive  processing  while  permitting  reconfiguration.  MIMD  architectures  are  just  as 
diverse.  There  are  extremely  fine-grain  dataflow  machines  like  the  Manchester  Machine, 
Sigma- 1,  and  the  MIT  Tagged-Token  dataflow  Machine  [Aea80],  bus-based  shared  memory 
architectures  like  the  IBM  RP3,  Inmos  Transputer,  and  C.mmp  [WLH81],  multicomputer 
networks  like  the  Cosmic  <.  :be  [Sei85]  and  Cm*  [OSS80]  and  distributed  systems  like  System 
R*  [Lin80]. 

The  Jellybean  Machine,  while  borrowing  ideas  from  successful  research  endeavors, 
has  goals  unique  enough  to  gain  a  somewhat  different  character  from  other  machines  of 
its  genre.  It  communicates  via  message  passing  and  addresses  only  local  memory,  as  in 
the  Cosmic  Cube  [Sei85]  and  the  Medusa  system  [OSS80].  On  the  other  hand,  these  two 
systems  control  execution  by  a  system  of  pipes  and  locks,  where  processes  wait  for  data  to 
arrive  via  messages.  The  J-Machine,  instead,  uses  message  sends  to  schedule  processes,  and 
not  to  provide  socket-to-socket  communication.  State  manipulation  doesn’t  involve  explicit 
connections  between  running  processes.  Instead,  return  values  are  propagated  around  to 
slots  in  contexts  and  code  is  executed  when  results  arrive  in  a  more  ‘‘functional”  manner. 

Many  systems  also  have  virtual  memory  and  some  systems  use  an  object  or  segment 
based  storage  model  [WLH81]  as  does  the  J-Machine,  but  the  emphasis  is  slightly  different 
in  our  design.  Where  most  systems  use  a  virtually  addressed,  multi-level  memory  system 
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to  expand  primary  memory  and  provide  relative  address  mapping,  the  J-Machine  uses  a 
virtual  addressing  system  to  provide  a  global  namespace  across  all  nodes  and  to  provide 
convenient  access  to  objects  as  the  primitive  memory  metric.  This  is  more  similar  to  large, 
complex’distributed  systems  such  as  IBM’s  distributed  database,  System  R*  [Lin80]  than 
conventional  parallel  processors. 

Finally,  the  J-Machine  targets  itself  to  a  high-level  programming  environment.  The 
RISC  processing  node,  called  the  Message- Driven  Processor  [HT88],  provides  a  fast,  power¬ 
ful  substrate  for  the  execution  of  high-level  languages,  such  as  Smalltalk.  There  are  several 
architectures  designed  for  the  efficient  execution  of  high-level  language  applications,  such 
as  the  Symbolics  Lisp  Machine  and  the  SOAR  Smalltalk  processor  [Ung87],  but  very  little 
work  has  been  done  targeting  concurrent  processors  to  high-level  languages. 


1.5  Organization 

The  rest  of  this  report  will  discuss  the  structure  of  the  Jellybean  system.  Chapter  2  provides 
a  high  level  layering  of  the  Jellybean  system  —  from  single  processing  node  hardware  to  the 
high  level  programming  of  the  entire  concurrent  processing  network.  Chapter  3  describes 
the  memory  management  and  addressing  system.  Chapter  4  discusses  the  machine  as  a 
distributed  system  supporting  object  migration  to  balance  load.  Chapter  5  explains  code 
execution  on  the  method  level,  and  6  details  the  object-oriented  calling  extensions.  Storage 
reclamation  issues  will  be  introduced  in  chapter  7.  Chapter  8  discusses  some  of  the  services 
provided  to  support  high-level  language  constructs  and  to  control  code  execution.  Chapter 
9  describes  *v  ?  prototype  operating  system  implementation  noting  its  successful  as  well  as 
not-so-successful  features,  and  discussing  some  of  the  difficulties  and  quirks  faced  by  the 
system  designer.  The  report  concludes  with  a  performance  evaluation  and  summary  in 
chapters  10  and  11. 


Chapter  2 

The  Execution  Model  of  the 
Jellybean  Machine 

These  unhappy  times  call  for  the  building  of  plans  ... 
that  build  from  the  bottom  up  and  not  from  the  top  down 

—  Franklin  Delano  Roosevelt,  in  his  April  17,  1932  Radio  Address 

The  Jellybean  Operating  System  Software  (JOSS)  is  built  in  a  layered  manner  where 
each  layer  provides  a  different  model  of  functionality  to  the  machine.  Figure  2.1  attempts  to 
describe  this  layering,  and  what  new  functionality  each  layer  provides  to  the  entire  system. 


At  the  bottom  of  the  figure  lies  the  base  processor  and  boot  code.  At  this  stage, 
the  processing  node  can  be  initialized,  and  can  run  independently  as  a  limited  micropro¬ 
cessor.  The  addition  of  system  call  and  fault  handlers  provide  a  level  of  system  services 
and  robustness  to  the  microprocessor,  allowing  it  to  allocate  memory  in  an  object-based, 
virtually  addressed  manner,  and  to  handle  various  types  of  exceptional  conditions  at  run 
time.  These  first  two  levels  of  the  Jellybean  system  build  up  the  abstract  processing  node 
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Execution  Model  Functionality 


User  programming  language 


Ibnpls  machine  independent  target  liafim 


Claaa/Maetor  calling  bwM 


Remote  Method  Call* 

Communication 
Distributed  Namespace 
CoMuimt  computing 

Object-based  mnoqr  allocation 
Optimistic  coda  generation 

Virtual  Namaapaca 
Amortod  lyatam  Services 

Simple  instruction  art,  tagged,  local  memory 
hat  priority  switches 


Figure  2.1:  Layering  of  Jellybean  Syatem 
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capable  of  executing  machine  code  and  performing  a  set  of  system  services. 

Concurrency  is  provided  as  the  next  level  of  functionality  by  the  introduction  of 
primitive  message  handlers.  Each  processing  node  has  the  ability  to  send  messages  to  any 
other  node,  where  a  message  is  simply  a  physical  address  to  start  running  on  a  foreign  node, 
followed  by  routine-specific  data.  Thus,  a  Jellybean  primitive  message  is  actually  just  a  way 
of  changing  a  program  counter  of  a  remote  node.  A  set  of  common  operations  can  be  placed 
in  identical  physical  memory  locations  on  each  node,  so  that  an  operation  can  be  run  on  any 
node  by  mailing  that  routine’s  address  to  the  node.  The  operating  system  provides  a  small 
set  of  primitive  message  handlers  to  perform  common  operations  which  reside  in  the  same 
locations  on  each  node.  With  this  small  set  of  locked-down  routines,  the  machine  gains  the 
ability  to  compute  concurrently,  to  use  a  global  addressing  abstraction  over  the  physically 
distributed  memories,  and  to  perform  some  amount  of  object  migration  and  other  control 
of  resources. 

Two  special  primitive  message  handlers  are  special,  in  that  other  system  services  are 
built  on  top  of  them.  The  CALL  message  handler  provides  a  mechanism  for  starting  code 
contained  in  virtually-addressed  relocatable  objects,  rather  than  just  code  that  resides  at 
locked-down  physical  addresses.  This  provides  a  convenient  way  of  packaging  objects  and 
supporting  remote  procedure  calls.  The  SEND  message  takes  the  code  execution  mechanism 
to  an  even  higher  level,  and  provides  for  a  dispatch-on-type  calling  model  as  used  in  object- 
oriented  systems  like  Flavors  or  Smalltalk. 

The  final  two  layers  of  the  system  are  the  interfaces  for  the  programming  models. 
The  Jellybean  Machine  under  this  highest  level  of  abstraction  appears  to  the  user  a  system 
to  run  high-level  languages  like  Smalltalk. 

The  rest  of  this  chapter  will  go  into  the  abstractions  in  more  detail,  describing  what 
functionality  each  level  of  the  machine  provides.  It  may  be  helpful  to  refer  back  to  figure 
2.1  as  you  read  the  following  sections. 
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2.1  The  Processing  Node 

Each  node  of  the  Jellybean  multiprocessor  (a  Message- Driven  Processor)  is  a  tagged- 
architecture  microprocessor  with  a  small  on-chip  memory  with  separate  register  sets  for 
operating  at  two  priority  levels. 

2.1.1  Machine  Code 

The  machine  code  interpreted  by  a  Message-Driven  Processor  (MDP)  is  a  simple  3  operand 
instruction  set  [HT88].  Code  is  executed  sequentially,  and  changes  in  control  are  provided 
by  simple  conditional  and  unconditional  branches.  The  instruction  stream  is  accessed  via 
two  registers,  one  that  points  at  the  base  of  the  code  block  (AO),  and  one  that  indicates 
the  current  offset  into  this  block  (IP). 

2.1.2  System  Calls 

The  processor  also  has  a  small  fixed  length  stack,  and  a  mechanism  to  make  system  calls. 
This  provides  us  with  the  ability  to  change  control  to  common  subroutines,  and  easily  restore 
execution  upon  return.  The  addition  of  the  system  call  machinery  gives  us  the  ability  to 
provide  several  extensions  to  the  processor  in  terms  of  system  services  written  in  machine 
code.  Heap  management,  and  an  object-based  memory  allocation  model  are  provided  with 
system  calls,  as  are  the  mechanisms  to  address  these  objects  with  relocatable,  virtual  IDs. 

2.1.3  Fault  Handlers 

Similar  to  system  calls,  the  MDP  also  contains  a  fault  handler  table  providing  software 
routines  to  run  when  instructions  fault  because  of  various  exception  conditions  (tag  mis¬ 
matches,  addressing  past  segment,  integer  overflow,  translation  buffer  lookup  miss,  etc.). 
When  a  fault  occurs,  the  IP  is  pushed  onto  the  stack,  and  the  appropriate  fault  routine 
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(found  in  the  exception  vectors  table)  is  run.  An  address  of  each  fault  handlers  is  placed 
in  the  exception  vector  table  by  software  initialization.  The  addition  of  the  fault  handlers 
gives  us  several  advantages  in  our  quest  of  an  object-oriented  concurrent  processor.  We  can 
use  tag  checking  to  support  optimistic  code  generation  and  a  type  of  “generic  operation” 
approach  on  the  machine  code  level.  The  fault  handlers  also  provide  us  the  ability  to  effi¬ 
ciently  implement  virtual  ID  lookup  via  the  XLATE  instruction.  The  fault  handlers  will  be 
described  in  more  detail  later  when  the  entire  system  has  been  more  thoroughly  explained. 

Since  both  the  system  calls  and  fault  handlers  are  supported  by  a  software  initialized 
vector  table,  the  processor  can  be  “reshaped”  into  a  different  type  of  machine  by  replacing 
the  ROM  code  that  sets  up  this  table.  Only  the  instruction  set  is  fixed,  allowing  the  MDP 
processing  node  to  be  used  as  a  basis  for  various  alternative  concurrent  processing  system 
paradigms. 

2.1.4  The  Basic  Node  of  Computation 

With  what  we  have  described  so  far,  our  processor  is  a  sequential  machine,  able  to  be 
executing  in  one  of  two  priorities.  It  refers  to  its  instruction  stream  using  physical  memory 
base  and  offset  registers.  The  addition  of  the  system  calls  provides  an  interface  to  OS 
services,  such  as  those  to  allocate  memory,  generate  virtual  object  IDs  and  to  manage  object 
ID  to  physical  address  translation.  The  fault  handlers  permit  us  to  develop  “optimistic” 
code,  where  a  normal,  error-free  execution  will  proceed  rapidly,  and  we  only  pay  the  price  of 
software  execution  if  an  error  condition  occurs.  The  fault  handlers  are  also  used  to  support 
a  fast  virtual  namespace,  where  translation  can  be  as  fast  as  the  XLATE  instruction. 

The  sum  is  a  flexible,  object-based  microprocessor  that  will  serve  as  our  basic  node 
of  computation  as  we  venture  into  the  realm  of  concurrency. 
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2.2  The  Concurrent  Processor  Model 

By  providing  mechanisms  for  node-to-node  communication,  our  machine  becomes  a  mul¬ 
tiprocessor,  called  the  Jellybean  Machine.  Many  MDP  processing  nodes  (as  well  as  other 
potential  nodes  such  as  floating  point  processors  and  memory  nodes)  are  connected  together 
in  a  network.  Communication  between  the  nodes  is  provided  by  the  MDP  SEND  instruction 
which  injects  messages  into  the  network.  The  messages  are  routed  by  routing  hardware  to 
the  message  queues  on  the  destination  node. 

Messages  received  by  an  MDP  processing  node  consists  of  two  parts,  a  message 
header  which  contains  the  address  of  the  primitive  message  handler  to  run,  and  a  sequence 
of  message  specific  data  words.  The  header  of  the  message  acts  in  effect  like  a  process 
descriptor  for  providing  efficient  message  execution.  When  a  message  arrives  at  the  specified 
node,  it  lands  in  the  destination  node’s  queue.  The  queue  acts  as  a  FIFO  scheduler  of 
primitive  message  processes.  When  the  message  moves  to  the  head  of  the  queue,  the  MDP 
executes  the  message  by  setting  the  instruction  pointer  register  to  point  to  the  primitive 
message  handler  whose  address  is  in  the  header  of  the  message. 

Several  useful  system  services  are  written  as  primitive  message  handlers.  Examples 
of  primitive  message  handlers  include  those  to  make  a  new  object  on  a  node  (NEW.MSG) 
and  to  request  a  copy  of  a  method  from  a  node  (METHOD-REQUEST-MSG). 

With  the  addition  of  primitive  messages,  we  have  the  ability  to  process  concurrently, 
and  to  support  a  distributed  namespace.  We  can  now  extend  our  virtual  memory  system 
to  support  naming  of  objects,  not  just  in  the  local  memory,  but  on  any  node  in  the  entire 
network.  With  a  distributed  namespace,  we  gain  flexibility  of  resources.  We  can  migrate 
objects  as  we  need  them  to  balance  load  and  to  free  up  memory. 
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2.2.1  Methods  and  the  CALL  Message 

Up  to  this  point,  we  have  only  been  able  to  run  foreign  code  that  resides  at  fixed  physical 
locations.  We  desire  a  more  flexible  mechanism  for  dealing  with  blocks  of  code,  such  as  those 
that  will  be  output  by  compilers.  Since  we  already  have  an  object  based  storage  model, 
it  would  be  very  convenient  to  store  code  routines  in  objects  and  provide  a  mechanism  for 
their  execution.  We  call  code  routines  stored  in  virtually  addressed,  relocatable  objects 
methods  to  differentiate  them  from  physical  locked  down  code  sequences.  We  provide  a 
mechanism  to  start  these  methods  executing  by  writing  a  primitive  message  handler  called 
the  CALL  message  handler.  When  a  CALL  -MSG  starts  executing  on  a  node,  it  runs  the 
method  indicated  in  the  message  argument.  This  allows  us  to  have  a  flexible  system  of 
remote  procedure  calls. 

2.2.2  SENDing  Selectors  to  Objects 

The  final  operating  system  layer  in  our  quest  for  an  object-oriented  execution  model  is 
the  SEND _MSG  message  handler.  A  SEND-MSG  consists  of  a  selected  generic  operation, 
represented  by  a  unique  symbol  called  a  selector ,  followed  by  the  object(s)  that  the  selector 
acts  upon.  If  we  wanted  to  send  the  DRAW  selector  to  an  object  (say  a  triangle),  we 
would  SEND  a  SEND-MSG  message  to  the  node  the  triangle  object  resides  on,  passing  the 
selector  DRAW,  and  the  virtual  address  of  the  triangle  object  receiving  the  selector  (called 
the  receiver).  When  the  SEND-MSG  handler  gets  executed,  it  determines  the  appropriate 
method  to  run,  and  then  remotely  calls  the  procedure  by  sending  a  CALL-MSG  message 
to  this  method  which  then  draws  the  triangle. 

In  order  for  this  system  to  work  it  is  necessary  to  maintain  certain  system  tables 
that  map  pairs  of  selectors  and  object  classes  with  the  virtual  IDs  of  methods  to  perform 
the  desired  information.  It  is  also  necessary  to  insure  that  semantically  indenticai  selector 
operations  get  the  same  selector  symbol.  In  other  words,  all  PLUS^perations  must  get  the 
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same  symbol  representing  +.  The  exact  mechanisms  of  the  class/selector  system  will  be 
described  in  more  detail  in  chapter  6. 


2.3  High  Level  Language  Model 

For  the  final  part  of  our  tour  of  the  Jellybean  Machine,  let  us  step  back  once  more,  and 
view  the  machine  from  the  perspective  of  the  programming  languages  that  will  be  used  to 
write  user  programs. 

2.3.1  Intermediate  Code 

To  provide  a  uniform  target  language  for  compilers,  we  have  specified  an  intermediate 
language  called  i-eode.  This  language  has  a  simple  set  of  operations,  and  a  simple  manner  of 
referencing  operands.  By  passing  the  send  code  through  a  code  generator  and  a  linker/loader 
we  can  store  actual  MDP  machine  code  on  nodes.  The  i-code  level  of  the  system  provides  a 
convenient  entry  point  for  various  compilers  that  necessitates  no  knowledge  of  the  underlying 
layers.  All  interaction  is  via  the  protected  subsystem  of  the  i-code  interface.  This  interface, 
in  effect,  provides  an  abstract  i-code  machine  that  can  be  of  use  in  many  different  machine 
configurations.  Implementations  of  this  interface  on  different  machine  architectu.es  would 
provide  a  convenient  way  to  reuse  compilation  tools  and  compare  system  performance. 

2.3.2  User  Languages 

The  user  language  model  is  what  would  be  seen  by  the  user  of  the  Jellybean  Machine.  He/she 
would  be  faced  with  the  language  interaction  shell  and  would  see  none  of  the  internal  layers 
that  compose  the  system.  The  currently  supported  user  language  is  a  prefix  notation  form 
of  concurrent  Smalltalk  [DC].  Other  languages,  such  as  a  Lisp  with  flavors  should  also  be 
possible. 


Chapter  3 


Memory  Management  and 
Addressing  System 


Work  without  hope  draws  nectar  in  a  sieve 
And  hope  without  an  object  cannot  live 

—  Samuel  Taylor  Coleridge,  in  Work  Without  Hope 

Oh  call  it  by  some  better  name 
For  friendship  sounds  too  cold. 

—  Thomas  Moore  in  Ballads  and  Songs:  Oh  Call  It  by  Some  Better  Name 

The  Jellybean  Machine,  targeted  for  object-oriented  applications,  needs  to  have  an 
object-based  storage  model.  This  chapter  sketches  the  machinery  that  interact  to  provide 
this  model.  The  mechanisms  basically  consist  of  two  parts,  (1)  the  services  to  allocate  and 
deallocate  contiguous  blocks  of  physical  memory,  and  (2)  the  virtual  addressing  abstractions 
that  make  objects  the  basic  unit  of  storage.  This  virtual  address  allows  object  relocation 
and  provides  a  way  to  reference  storage  on  foreign  nodes.  Virtual  naming  and  physical 
allocation  systems  combine  to  form  an  object  based  programming  system. 
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Figure  3.1:  Schematic  Model  of  the  Memory  System 


At  the  heart  of  the  object  based  system  is  the  NEW  system  call,  which  creates  a 
new  object.  This  routine  utilizes  the  3  object  system  subsystems,  the  translation  manager, 
the  name  manager,  and  the  memory  manager.  This  interactioh  of  the  various  systems  is 
shown  in  figure  3.1. 
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3.1  “Freetop”  Contiguous  Heap  Allocation 

Each  node  of  a  Jellybean  Machine  has  its  own  local  memory  that  can  be  accessed  very 
rapidly.  Part  of  this  local  memory  is  reserved  as  a  heap  to  allocate  blocks  of  memory  from. 
Heap  allocation  is  done  in  a  straightforward  “freetop-next”  manner.  Memory  is  allocated 
starting  from  the  current  top  of  free  memory,  and  the  freetop  pointer  is  moved  past  the 
block  allocated.  The  ALLOC  system  call  handles  the  allocation  requests. 


3.2  Compaction  is  Fast 

Deletion  of  objects  fragments  the  heap  leaving  unused  “holes”  in  the  heap.  We  reclaim  this 
storage  by  sweeping  objects  down  toward  the  base  of  the  heap,  to  fill  up  the  blank  space, 
with  the  freetop  following  accordingly.  Since  each  local  memory  is  small  and  fast,  and 
each  processor  can  sweep  in  parallel,  compaction  takes  very  little  time.  Figure  3.2  shows  a 
process  of  heap  allocation,  deletion,  and  compaction. 


3.3  Physical  Base/Length  Addressing 

Blocks  of  memory  are  described  by  physical  base/length  values  supported  by  the  processor’s 
primitive  ADDR  data  type.  The  base  is  the  starting  address  of  the  block  of  memory,  and  the 
length  is  used  for  access  bounds  checking.  The  format  of  an  ADDR  tagged  value  is  shown 
in  figure  3.3.  The  tag  of  the  physical  address  word  is  a  unique  number  ADDR  representing 
a  physical  address  value.  The  R  bit  is  used  to  specify  that  an  address  value  points  to  a 
relocatable  object.  The  I  bit  specifies  that  the  address  is  now  invalid.  Both  of  these  bits 
are  used  for  the  implementation  of  virtual  addressing. 
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Figure  3.2:  “Preetop”  Heap  Allocation,  Deletion,  Compaction. 
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Figure  3.3:  A  Physical  Address  Word  Format 
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3.4  Virtual  Addressing  Extension 

Having  physical  addresses  only  allows  us  to  access  objects  on  the  current  node.  It  provides 
us  no  mechanism  for  naming  objects  on  different  nodes.  For  this  reason  and  because  it  eases 
relocation  and  provides  an  object-based  storage  model,  we  extend  our  addressing  system 
from  the  local,  physical  namespace  provided  by  the  physical  ADDR  values  to  a  global, 
virtual  namespace  using  virtual  object  IDs.  A  virtual  ID  is  a  global  name  for  an  object. 

3.4.1  Creating  New  Objects 

Objects  are  created  by  the  NEW  system  call.  The  system  call  allocates  memory  with  the 
ALLOC  call,  reserving  the  first  two  words  of  the  allocated  block  of  memory  for  object  header 
information.  Once  the  block  of  memory  is  allocated,  a  unique,  virtual  ID  is  generated  with 
the  GENID  system  call.  The  first  word  of  the  block  of  memory  is  initialized  to  contain  the 
length  and  data  type  of  the  object,  and  the  second  word  is  set  to  the  virtual  ID.  Finally, 
a  virtual  ID  to  physical  address  binding  is  made  for  the  object  so  we  can  find  the  physical 
location  given  the  ID.  The  format  of  an  object  is  shown  in  figure  3.4. 

To  manage  this  virtual  namespace  efficiently,  we  need  some  operating  system  and 
hardware  support.  First  of  all,  the  processor  provides  a  matching  ID  register  for  each 
physical  address  (A)  register.  These  ID  registers  hold  the  virtual  IDs  for  the  objects  whose 
physical  addresses  are  in  the  A  registers.  We  also  provide  a  translation  buffer  as  we  will 
discuss  shortly. 

3.4.2  Virtual  Memory  System  Calls 

The  GENID  system  call  generates  a  new  serial  number,  unique  on  the  current  node.  The 
current  implementation  encodes  a  virtual  ID  in  two  fields,  a  node-unique  serial  number,  and 
a  node  number  component  representing  the  node  number  an  object  was  created  on.  The 
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Figure  3.4:  The  Structure  of  in  Object 
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Figure  3.5:  A  Virtual  Address  Word  (ID)  Format 


format  of  this  virtual  ID  is  shown  in  figure  3.5.  There  are  also  several  utility  routines  used 
to  manage  the  virtual  -*  physical  translation  table  (called  the  Birth/Residence  Address 
Table,  or  BRAT).  These  routines  add,  lookup,  and  remove  bindings  from  the  translation 
table.  They  are  implemented  by  the  extended  system  calls  BRAT-ENTER,  BRATJCLATE, 
and  BRAT -PURGE  respectively.  Finally,  we  provide  the  NEW  system  call  to  allocate  and 
install  a  new  object.  This  service  allocates  physical  memory,  generates  a  virtual  ID,  installs 
the  virtual  -*  physical  binding  in  the  BRAT,  and  returns  both  the  ID  and  the  address.  The 
NEW  system  call  is  to  the  virtual  addressing  model  as  ALLOC  is  to  the  physical  addressing 
model. 


3.4.3  TVanslation  Buffer 

To  speed  up  translation,  each  processing  node  has  a  2- way  set-associative  translation  buffer, 
and  the  accompanying  ENTER,  XL  ATE,  and  PURGE  machine  instructions.  The  XLATE 
instruction  will  fault  if  no  binding  is  found  in  the  cache,  and  a  software  exception  handler 
will  be  run  to  resolve  the  name. 
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3.4.4  Automatic  Retranslation 

To  support  maximum  efficiency  in  normal  case  situations,  the  processing  node  provides  an 
“invalid”  bit  in  each  address  (A)  register.  If  this  bit  is  set,  it  signifies  that  the  ID  and  A 
register  have  values  that  are  no  longer  consistant.  Any  access  of  an  invalid  A  register  will 
cause  a  fault  handler  to  be  run  which  will  retranslate  the  ID  register  into  the  A  register 
and  continue.  This  way  we  can  be  “lazy”  and  retranslate  invalid  bindings  only  if  needed. 


3.5  Summary 


Physical  block  allocation  is  used  to  reserve  segments  of  memory.  Virtual  IDs  are  associ¬ 
ated  with  these  blocks  of  memory,  and  bindings  are  formed,  to  provide  an  “object-based” 
allocation  model.  This  object  allocation  model  provides  the  following  benefits 

•  An  abstract  memory  model,  where  “objects”  are  the  primitive  metric  of  storgae  rather 
than  physical  addresses. 

•  A  location  independent  memory  model  with  indirection  through  a  translation  table, 
allowing  ease  of  relocation. 

•  The  ability  to  represent  the  data  types  of  objects. 

•  The  introduction  of  a  global  namespace  where  we  can  refer  to  objects  residing  on  any 
node  of  the  network. 


Chapter  4 

Distributed  System  Support 


I  pity  the  man  who  can  travel  from  Dan 
to  Beersheba  and  cry,  ’Tis  all  barren! 

—  Lawrence  Sterne,  in  A  Sentimental  Journey  (1768) 

In  the  previous  chapter  we  developed  a  object  based  allocation  model  and  a  global 
naming  system.  With  this  functionality,  we  gain  much  greater  flexibility.  We  take  this 
system  one  step  further  in  this  chapter,  as  we  describe  a  mechanism  to  migrate  objects 
from  node  to  node.  This  added  ability  requires  a  few  extensions  to  the  virtual  naming 
model  presented  in  the  previous  chapter. 

4.1  The  Idea 

In  the  previous  naming  model,  virtual  IDs  were  bound  to  physical  addresses.  Since  objects 
were  not  allowed  to  migrate,  they  were  forced  to  always  reside  on  their  birthnode.  Now  that 
objects  are  allowed  to  emigrate  to  different  nodes,  we  need  to  expand  our  name  resolution 
system.  In  addition  to  virtual  — .  physical  bindings  we  add  a  virtual  -*  node-number 
binding  semantically  representing  a  “hint"  that  the  object  in  question  now  resides  on  a 
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Figure  4.1:  An  Example  of  Hints 


different  node  number.  Figure  4.1  shows  that  node  #1  has  a  hint  that  an  object  is  on  node 

#2. 


4.2  Chaining  of  Hints 

These  node  number  “hints”  indicate  another  node  to  look  on  for  the  object  in  question.  The 
current  implementation  allows  chaining  of  hints  (although  cycles  will  never  form).  If  we  ever 
follow  a  path  of  hints  and  find  no  binding  for  the  object  ID,  we  then  query  the  birthnode 
which  is  required  to  have  a  path  to  the  object  in  question.  Figure  4.2  is  a  snapshot  of  a 
system  where  a  chain  of  hints  has  formed  to  an  object. 

A  question  then  arises  as  to  how  long  to  let  these  chains  of  hints  be.  Some  distributed 
systems,  such  as  System  R*  [Lin80],  only  allow  paths  of  length  1,  i.e.  one  hint.  If  the 
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object  is  aot  one  hint  transition  away,  the  system  then  defaults  'to  the  birthnode  where 
the  location  of  the  object  is  found,  and  the  previous  incorrect  hint  is  updated.  However, 
in  our  system  we  choose  to  have  multiple  hints  because  objects  may  migrate  quite  a  bit, 
and  this  would  increase  the  number  of  birthnode  accesses.  Performance  could  significantly 
degrade  if  a  popular  object  moved  quite  a  bit  (as  we  would  expect  popular  objects  to  do). 
If  we  notice  in  later  performance  experiements,  that  chains  of  hints  become  commonplace, 
adding  latency  and  unnecessary  network  traffic,  we  can  adopt  one  of  2  solutions,  (1)  only 
allow  one  hint  or  (2)  collect  and  update  old  hints  periodically. 


4.3  Calculating  Likely  Nodes  From  Object  IDs 

The  operating  system  provides  a  system  call  for  finding  a  likely  node  that  an  object  resides 
on.  This  ID.TO-NODE  call  takes  the  virtual  ID  of  the  object  and  returns  a  node  number. 
It  does  so  by  the  algorithm  charted  in  figure  4.3.  It  works  in  the  following  way.  The  virtual 
ID  is  looked  up  in  the  translation  table.  If  it  is  not  there,  we  have  no  idea  where  the  object 
is,  so  we  check  the  birthnode.  If  there  is  a  binding,  but  the  binding  is  to  a  hint  (an  integer 
value),  we  return  this  hint  as  the  probable  residence  node.  Finally,  if  the  binding  is  to  a 
physical  address,  the  object  is  local,  and  the  local  node  number  is  returned. 


4.4  Virtual  To  Physical  Translations  In  The  Migrant  Ob¬ 
ject  World 

Now  that  objects  are  allowed  to  wander  aimlessly  across  the  nodes  of  the  Jellybean  Machine, 
virtual  to  physical  address  translations  are  necessarily  slightly  more  sophisticated.  Three 
conditions  can  occur  when  we  attempt  to  translate  a  virtual  ID  into  a  physical  address. 

1.  We  find  a  physical  address  value  for  the  binding 

2.  We  find  a  hint  to  where  the  object  currently  resides 
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3.  We  find  no  binding  for  the  object 

Case  1  is  the  normal  situation.  The  physical  address  associated  with  the  object  ID  is 
returned.  Case  2  implies  that  the  object  is  rumored  to  be  on  a  foreign  node.  We  then 
send  a  request  to  this  node  asking  that  the  object  be  shipped  here  for  processing,  and  we 
suspend  our  process  onto  a  wait  list.  Case  3  occurs  when  a  node  has  no  idea  where  an 
object  resides.  In  this  case,  we  send  a  request  to  the  birthnode  asking  for  the  object.  If  the 
birthnode  doesn’t  know  where  an  object  is,  it  loops,  mailing  messages  to  itself,  assuming 
the  object  is  in  a  state  of  transition  somewhere. 


4.5  Bouncing  Objects 

Note  that  this  method  of  finding  data  objects  may  cause  them  to  bounce  around  from  node 
to  node,  as  different  processors  wish  to  compute  on  them.  This  is  the  direct  result  of  several 
design  decisions:  (1)  each  processor  executes  only  one  task  at  a  time,  (2)  memory  is  not 
shared  among  processors,  (3)  mutable  data  objects  are  not  cached,  and  (4)  an  object’s  data 
lies  entirely  on  one  node.  The  first  and  second  decisions  are  fundamental  to  the  design  of 
our  machine.  We  chose  the  grain  size  and  memory  model  to  provided  a  moderately  fine 
grain,  highly  scalable  processor.  We  chose  not  to  do  object  caching  because  it  is  expensive 
to  do  in  software,  and  is  difficult  on  a  network  based  memory  model.  It  may  be  possible  to 
provide  coherent  caching  in  the  future  however.  The  final  restriction,  that  an  object’s  state 
is  contained  on  one  node  only  is  for  simplicity’s  sake,  and  can  be  at  least  partially  lifted  by 
the  introduction  of  “distributed  objects”  described  in  a  later  section. 

So,  with  these  characteristics  in  mind,  it  becomes  important  for  us  to  try  to  prevent 
unnecessary  “pinging”  of  objects  from  node  to  node.  One  way  this  is  done  is  by  “sending 
work  to  the  object”  rather  than  “sending  the  object  to  the  work”.  Unfortunately,  this  is 
difficult  to  do  in  the  general  case  due  to  problems  with  transferring  processor  state.  As  a 
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compromise,  we  set  the  following  policy. 

1.  If  we  were  sending  a  selector  to  an  object,  and  the  object  is  not  local,  we  forward  the 
selector  to  the  location  of  the  object1. 

2.  If  we  were  accessing  a  non-local,  immutable  object,  we  halt,  saving  our  process  state, 
request  a  copy  of  the  object,  and  restart  execution  when  the  copy  arrives. 

3.  If  we  were  accessing  a  non-local,  mutable  object,  we  halt,  saving  our  process  state, 
move  the  object  here,  and  restart  when  it  arrives. 

This  policy  reduces  the  severity  of  the  “pinging”  problem,  because  work  tends  to  accumulate 
at  the  object,  while  at  the  same  time,  allowing  the  object  to  move  if  it  has  to. 


4.6  Details  About  Object  Migration 


This  section  formalizes  the  mechanisms  provided  to  migrate  objects.  When  we  try  to  access 
a  non-local  object,  we  mail  away  to  request  a  copy  of  the  object  or  to  move  the  object 
(depending  on  whether  the  object  is  immutable  or  mutable,  respectively)2.  When  we  wish 
to  request  a  non-local  object,  the  following  steps  are  taken: 

1.  The  processor  state  is  saved  in  a  context  object,  and  the  context  is  marked  waiting 
for  the  ID  of  the  object  being  requested. 

2.  The  context  is  placed  in  a  resource  wait  table  that  indicates  processes  waiting  on 
objects. 

3.  A  MIGRATE.OBJECT  message  is  sent  to  the  best  guess  residence  of  the  object, 
asking  it  to  be  migrated  to  the  requesting  node,  and  the  process  suspends,  able  to 
execute  the  next  message  in  the  queue. 

4.  This  MIGRATE.OBJECT  message  is  forwarded  down  the  chain  of  hints.  If  it  lands  on 
a  node  with  no  binding  for  the  ID  in  question,  the  search  continues  at  the  birthnode. 
Finally  this  message  arrives  at  the  node  the  object  resides  on,  and  the  message  handler 
is  run. 

5.  If  the  object  in  question  is  marked  unmovable ,  then  the  message  is  sent  back  to 
the  start  of  the  queue,  otherwise  the  message  handler  decides  whether  the  object  is 
mutable  or  not,  and  acts  depending. 

•  If  it  is  mutable,  the  bindings  are  removed  from  this  node,  the  object  is  mailed  in 
an  IMMIGRATE.OBJECT  message  back  to  the  requesting  node,  and  the  object 
is  deleted. 


‘The  claas/selector  late- binding  activation  model  is  discussed  in  detail  in  chapter  6. 

’Since  a  process  cannot  be  interrupted  by  a  same  priority  message,  it  does  not  suffer  from  livelock  and 
can  always  make  headway: 
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•  If  the  object  is  read-only,  the  data  is  mailed  in  an  IMMIG  RATE-COPY  message 
back  to  the  requesting  node. 

6.  These  messages  eventually  arrive  back  at  the  requesting  node. 

•  When  a  IMMIGRATE.OBJECT  message  arrives,  the  menage  handler  (1)  allo¬ 
cates  the  object,  (2)  marks  the  object  immovable  (until  it  can  update  the  birthn- 
ode,  to  prevent  a  race  condition  where  hint  updates  may  occur  out  of  sequence), 
(3)  copies  the  data  into  the  object,  (4)  mails  a  NOW  .RESIDING -AT  message  to 
the  previous  node  of  residence,  and  (5)  calls  the  RESOURCE -ARRIVED  system 
call,  which  will  queue  the  restart  of  the  waiting  contents. 

•  When  a  IMMIGRATE.COPY  message  arrives,  the  handler  ( ^allocates  the  ob¬ 
ject,  (2)  marks  the  object  header  as  a  copy,  (3)  lands  the  old  ED  to  this  new  ob¬ 
ject,  (4)  copies  the  data  into  the  object,  and  (5)  calls  the  RESOURCE-ARRIVED 
system  call,  which  will  queue  the  restart  of  the  waiting  contexts  (copies  can  be 
collected  when  storage  runs  km). 

7.  The  NOW_RESIDING_AT  message  makes  a  hint  from  the  current  node  to  the  new 
node,  and  mails  a  UPDATE.BIRTHNODE  message  to  the  birthnode  of  the  object, 
telling  it  of  the  object's  new  location. 

8.  The  UPDATEJBIRTHNODE  message  makes  a  hint  to  the  new  location  and  mails  an 
OBJECT -MOVABLE  message  to  the  location  erf  the  new  object,  passing  its  ID. 

9.  The  OBJECT  .MOVABLE  message  marks  the  object  movable.  Now  the  object  is  free 
to  move  again. 


Figure  4.4  shows  an  example  of  this  process. 


4.7  Summary 

The  addition  of  a  mechanism  for  object  migration  adds  much  more  flexibility  to  the  Jelly¬ 
bean  system.  Without  imposing  policy,  the  migration  and  copying  system  provides  the 
basic  mechanism  for  resource  sharing.  To  alleviate  name  resolution  bottlenecks  at  object 
birthnode,  I  designed  a  system  of  cycle-free  hints  to  indicate  where  objects  currently  lie.  It 
is  not  clear  how  long  to  allow  these  chains  of  hints  to  be.  Long  chains  of  hints  would  cause 
unnecessary  network  traffic  and  increase  latency.  Having  single  hints  would  increase  the 
number  of  birthnode  accesses  and  require  mechanisms  for  removing  dd  links.  The  system 
currently  supports  chains  of  hints. 


Chapter  5 

A  Virtually  Addressed  Code 
Execution  Model 


They  shall  mount  up  with  wings  as  eagles; 
they  shall  run,  and  not  be  weary,  and 
they  shall  walk,  and  not  faint 

—  The  Holy  Bible,  Isaiah,  40:31 


At  the  most  primitive  level,  we  could  execute  physically  addressed  blocks  of  machine 
code  by  directly  setting  the  registers,  or  by  sending  primitive  messages.  Unfortunately, 
we  have  no  mechanism  to  allocate  or  relocate  these  blocks  of  code,  they  are  physically 
addressed  and  sedentary.  This  chapter  presents  the  system  mechanisms  that  interact  to 
provide  a  more  flexible,  but  low  overhead  model  for  code  execution  by  taking  advantage  of 
the  virtually-addressed,  object-based  storage  model  we  developed  in  the  last  2  chapters. 

I  will  present  (1)  the  advantages  of  an  object-based  code  model,  (2)  the  mechanisms 
for  executing  object-based  code,  (3)  local  caching  of  methods,  (4)  contexts,  suspension, 
and  waiting  for  resources,  and  (5)  efficient  ways  of  distributing  code  models  across  a  large 
network. 
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Figure  5-1:  Format  of  the  CALL  Message 


5.1  Taking  Advantage  of  Object  Storage 

By  advantage  of  the  object  storage  and  naming  ayatem  we  developed,  we  an  able 

to  wrap  threads  of  code  inside  objects  and  gain  all  of  the  benefits  of  title  mere  powerful 
object-based  abstraction,  of  which  a  few  are:  (1)  dynamic  allocation,  (2)  relocation,  even 
across  nodes,  and  (3)  convenient  naming  and  name  resolution.  This  view  of  code  blocks  as 
objects  (or  methods,  which  is  what  we  call  code  blocks  that  are  wrapped  in  objects)  allows 
us  to  consider  more  advanced  calling  models,  such  as  the  ability  to  conveniently  support 
remote  procedure  calls  (RPCs)  and  the  flexibility  to  “send  the  work  to  the  data”  rather 
than  just  the  typical  mechanism  of  “bringing  the  data  to  the  work”- 


5.2  An  Overview  of  the  CALL  Message 

Ignoring  for  the  moment  the  question  of  initially  creating  methods,  let’s  concentrate  on  the 

mechanisms  needed  to  execute  them.  Tbe  operating  system  provides  a  primitive 

handler  for  a  CALL  message.  7b  start  a  method  running,  we  mail »  CALL  meffgp  to  the 

node  the  method  resides  cm1,  peering  as  arguments  the  virtual  ID  of  the  method  to  eqpeute, 

lSiac«  m  build  this  oe  tap  of  the  virtul,  distribiud  uaepw  swmM,  we  ttf  |*  WMi  10  Wti*  9V 
b«*t  |mh  wkm  method  resides. 


) 
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and  any  data  the  method  expects  as  parameters.  The  format  of  the  CALL  mesage  is  shown 
in  figure  5.1.  When  the  CALL  message  arrives  at  the  node  it  first  checks  if  the  method  is 
here.  If  so,  the  code  is  started.  If  not,  rather  than  forward  the  message  to  the  birthnode, 
we  note  that 

1.  Methods  are  immutable,  and  therefore  can  be  copied 

2.  Certain  methods  might  tend  to  be  called  often  from  many  nodes 


and  adopt  a  policy  of  copying  the  method  to  this  node.  This  way  we  provide  local  copies 
on  many  nodes  (these  can  be  periodically  purged  by  some  appropriate  stategy  to  free  up 
memory). 

Once  the  method  is  on  the  node  where  the  CALL  message  arrived,  the  message  can 
start  up  the  method.  It  does  that  by 

•  Translating  the  ID  of  the  method  into  its  physical  address 

•  Placing  this  physical  address  of  the  code  block  in  AO3 

•  Placing  a  2  in  the  IP  register 


These  steps  will  start  the  processor  executing  instructions  from  the  method,  starting  at  the 
third  word.  We  skip  the  first  two  words  of  the  method,  because  these  hold  object  header 
information.  The  steps  of  the  CALL  message  are  schematically  charted  in  figure  5.2.  If 
the  method  somehow  relocates  on  us  while  we  were  executing3,  the  process  that  relocated 
the  object  will  invalidate  the  AO  register.  When  our  process  starts  again,  it  will  fetch 
an  instruction  through  AO  and  cause  an  invalid  address  fault.  This  will  run  an  exception 
handler  to  retranslate  the  method  ID  (in  IDO)  into  the  physical  rddress  (putting  it  in  AO 
again),  and  we  will  continue  as  if  nothing  had  happened. 


aA0  always  points  the  the  bass  of  the  cods  currently  exented.  nnless  the  Ptoccasoc  to  ii 
i  this  vuseis  treated  always  as  0,  regardless  what  it  holds.  The  IP  register  holds  the 


is  (tMMM  mode, 
relative  offset  of 


where  this  vuue  is  treated  always  as  ... -  -  _  . 

the  program  coester  within  this  code  block  starting  at  AO.  (If  we  are  is  absolute  mode,  the  IP  register  acts 
is  effect  like  as  absolute  ad  dries  rather  than  a  relative  address,  because  absolute  mode  makes  the  processor 


pretend  the  vales  of  AO  is  0.) 

’This  could  be  caused  by  heap  compaction,  or  the  method  being  migrated  to  another  node  to  free  up 
space,  among  other  reasons 
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CALL 


) 


Got  method  ID  fro 
«M0.  XL  AIT  2) 
ieto  AO  (ia  awtttod  mods) 


9«  IPtowoOMt  of 
t  iaiotfct  method  object 
to  bjr  AO. 


Figure  5.2:  Flowchart  of  the  CALL  Vfelfexgfe  fcta&fet 
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5.3  Caching  Method  Copies 


Since  method  code  is  immutable,  we  can  cache  methods,  just  as  we  can  cache  other  read-only 
data.  To  request  a  copy  of  a  method  we: 

1.  Allocate  a  context  object  to  hold  our  processor  state,  so  we  can  restart  later 

2.  Copy  the  processor  state  into  the  context 

3.  Place  the  context  in  the  resource  wait  table  indicating  that  our  context  is  waiting  on 
this  requested  method 

4.  Mail  off,  requesting  a  copy  of  the  method 

5.  When  the  method  arrives,  it  is  placed  on  our  node  and  our  context  is  restarted 

These  cached  copies  will  have  the  copy  bit  set  in  the  object  header  so  that  the  storage 
reclaimer  will  know  that  this  cached  object  is  a  duplicate,  and  can  be  purged  if  space  is 
tight.  Let’s  now  look  in  a  bit  more  detail  at  contexts  and  this  resource  wait  table,  two 
crucial  mechanisms  for  supporting  high  level  execution  control. 


5.4  Contexts 

5.4.1  Why  Do  We  Need  Them? 

Contexts  are  just  objects  that  hold  the  important  state  of  the  processor,  so  the  current  task 
cab  be  halted  and  later  restarted  where  it  left  off.  In  addition,  contexts  can  provide  space 
for  local  variables  used  in  the  task’s  computation. 

5.4.2  How  Do  We  Make  Them? 

Contexts  are  allocated  by  the  NEW  .CONTEXT  system  call.  The  call  takes  as  an  argument, 
the  number  of  additional  variables  needed,  and  it  returns  a  context  big  enough  to  hold  the 
minimum  necessary  processor  state  plus  the  additional  variables.  When  a  process  is  done 
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Context  Address 


5  Word  Context  front 


N  Words  of  Temp  Space 


Processor  State 


Figure  $.3:  Structure  of  a  Typical  CoBtext 
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with  a  context,  it  should  explicitly  deallocate  it  with  the  FREE.CONTEXT  system  call. 
Figure  5.3  shows  the  format  of  a  typical  context. 

As  with  all  objects,  the  first  two  words  are  used  by  the  object  manager.  The  next 
three  words  are  used  to  hold  an  offset  to  the  processor  state  part  of  the  context  (for  faster 
restarts),  a  pointer  to  the  next  context  in  a  list  of  contexts,  and  a  value  indicating  that  the 
context  is  waiting  on  a  particular  resource.  The  context  then  contains  some  amount  of  user 
reserved  space  follwed  by  nine  words  of  processor  state.  The  minimal  size  of  a  context,  with 
no  user  space  is  14  words. 

5.4.3  How  Do  We  Make  Them  ...  Quickly!? 

Since  we  expect  contexts  to  be  used  very  often,  and  since  we  want  method  startup  costs  to 
be  small  and  methods  to  be  short,  we  don’t  want  a  majority  of  our  execution  time  to  be 
spent  allocating  contexts.  To  accomodate  these  constraints,  we  reuse  old  contexts  rather 
than  allocating  new  ones  each  time.  When  a  context  is  deallocated,  it  is  placed  back  on  a 
free  context  list.  The  next  time  a  context  is  requested,  we  try  to  re-use  one  from  the  free 
list,  since  this  will  take  only  a  few  instructions. 

However,  contexts  vary  in  size,  and  we  wouldn’t  want  to  have  to  walk  the  list  each 
time  to  see  if  we  have  a  context  big  enough  to  meet  our  request.  So,  we  only  save  contexts 
that  meet  a  common  size.  This  way,  any  time  we  request  a  context  of  this  “common’’  size, 
we  can  yank  the  first  one  off  of  the  free  list  and  use  it.  The  format  of  the  free  context  list 
is  shown  in  figure  5.4. 

The  first  context  in  the  free  context  list  is  pointed  to  by  the  CONTEXT -FREE-- 
LIST  operating  system  variable.  If  no  contexts  are  in  the  free  list,  the  OS  variable  is  set 
to  NIL.  Each  context  in  the  free  list  points  to  the  next  context  in  the  list  by  the  context’s 
NEXT -CONTEXT  slot  as  shown  previously  in  figure  5.3.  The  final  context  in  the  free  list 
has  its  NEXT.CONTEXT  slot  set  to  NIL. 


Operating  System  Variables 
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FigQM  5.4:  Th«  FYee  Context  List 
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5.4.4  Restarting  a  Context 

The  operating  system  provides  one  primitive  message  (RESTART -CONTEXT)  and  two 
system  calls  (XFERJD  and  XFER-ADDR)  to  restart  a  context.  The  system  calls  take 
either  an  ID  or  a  physical  address  of  a  context,  and  restarts  it,  copying  the  processor  state 
from  the  context  to  the  processor  registers.  The  restart  context  message  takes  a  context  ID 
and  transfers  control  to  it  by  calling  the  XFERJD  system  call  on  the  context  ID. 

5.5  The  Resource  Wait  Table 

The  resource  wait  table  is  a  system  data  structure  that  indicates  which  contexts  are  waiting 
for  which  services.  It  consists  of  two  parts.  The  first  part  of  the  wait  table  is  a  fixed  size 
associative  table  that  binds  resource  IDs  to  waiting  contexts.  Figure  5.5  shows  a  portion  of 
a  hypothetical  table.  We  see  several  contexts  waiting  for  ID1,  one  context  waiting  for  ID2, 
and  the  rest  of  the  slots  are  empty.  Empty  slots  are  set  to  NIL.  When  a  resource  arrives, 
the  wait  table  is  searched,  and  the  contexts  in  the  list  bound  to  the  ID  are  restarted. 

Searching  this  table  is  fast,  but  unfortunately,  we  can  not  bound  the  number  of 
entries  that  try  to  occupy  the  table.  At  some  time,  we  may  run  out  of  room.  When  this 
happens,  we  resort  to  a  slower  form  of  data  structure  and  link  the  contexts  waiting  on 
resources  in  a  list  called  the  resource  overflow  list.  If  we  don’t  find  a  binding  in  the  table, 
we  begin  searching  the  list  of  contexts.  Since  each  context  has  a  RESOURCE-NEEDED 
slot,  we  can  always  tell  what  resource  the  context  is  waiting  for.  This  provides  us  a  way  to 
continue  if  the  table  becomes  full.  By  sizing  the  table  appropriately,  it  may  be  possible  to 
limit  use  of  the  overflow  list  to  a  minimum. 


Operating  System  Variables 
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Figure  5.6:  The  Resource  Wait  Overflow  List 
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Figure  5.7:  A  Parallel  Resource  Request  Bottleneck  in  a  3  x  3  Network 


5.6  Removing  Method  Caching  Bottlenecks  with  Distribu¬ 
tion  Trees 

The  current  scheme  for  method  caching  implies  that  in  many  caeee,  nodes  wanting  methods 
will  have  to  ask  the  birthnode  of  the  method  (or  at  least  the  residence  node)  for  a  copy. 
If  many  nodes  simultaneously  need  the  same  method  (as  will  likely  happen  with  highly 
parallel  execution),  then  the  birthnode  will  be  deluged  with  method  requests  which  it  can 
only  handle  sequentially.  These  bottlenecks  could  degrade  performance  considerably.  For 
example,  figure  5.7  shows  a  network  of  9  processing  nodes.  Suppose  nodes  2  -  9  all  requested 
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a  method  copy  from  node  1.  Node  1  would  receive  a  barrage  of  8  requests  for  the  method 
which  would  eliminate  all  parallelism,  since  it  could  consider  each  request  only  sequentially. 

One  way  to  reduce  the  threat  of  performance  degrading  bottlenecks  is  to  set  up  a 
distribution  hierarchy,  so  that  each  node  requests  resources  from  its  local  distribution  center 
(the  distribution  hierarchies  are  different  for  different  resources).  Each  of  these  local  centers 
would  make  requests  to  its  superior,  all  the  way  up  to  the  master  resource  center.  We  can 
use  this  type  of  distribution  graph  to  help  in  requesting  method  copies  (or  copies  of  any 
type  of  immutable  data  for  that  matter). 

Take  again  the  3  x  3  node  network  example,  where  8  nodes  request  a  method  from 
node  1,  but  this  time  impose  a  distribution  bureaucracy  like  that  shown  in  the  tree  in  figure 
5.8.  This  time,  node  1  only  has  to  handle  3  messages,  from  nodes  2,  4  and  5.  Each  of  these 
nodes  serve  as  local  distribution  centers  for  the  remaining  nodes.  Node  2  services  nodes  3 
and  6,  node  4  services  nodes  7  and  8,  and  node  5  services  node  9.  In  this  manner  we  have 
permitted  more  parallelism  to  continue,  as  well  as  limiting  the  burden  on  node  1  (which 
could  cause  queue  overflow,  network  blocking,  and  other  conditions  where  performance 
degrades  considerably). 

Let’s  now  discuss  some  ways  that  a  distribution  tree  method  caching  scheme  can  be 
implemented  in  the  Jellybean  Machine  system  software.  First,  what  are  the  contraints  we 
are  working  under? 

•  The  distribution  tree  edges  must  be  easily  computable 

•  We  need  to  make  reasonable  choices  for  branching  factor  versus  tree  depth.  Too  high  a 
branching  factor  might  create  bottlenecks,  but  too  low  a  branching  factor  would  tend 
to  cache  unnecessary  copies,  and  suffer  long  latency  as  the  birthnoae  was  many  edges 
away  from  the  requesting  node. 

•  We  would  like  to  have  significantly  different  trees  for  different  resources.  Different 
methods  should  have  different  distribution  hierarchies,  again  to  decrease  bottlenecks, 
and  to  distribute  resources  more  thoroughly. 

One  fairly  simple  first  attempt  at  a  distribution  tree  formula  might  be  to  go  to  the 
distribution  center  that  is  halfway  between  the  current  node  and  the  birthnode  in  terms 
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of  hops.  In  other  words,  to  find  the  next  regional  distribution  center,  given  the  birthnode 
coordinates  (x*,j*)  and  our  current  coordinates  at  (xe,  ye),  we  would  calculate  the  halfway 
coordinates  (*j,yp  by: 
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This  is  in  fact  the  algorithm  used  to  create  the  distribution  tree  in  figure  5.8.  Figure  5.9 
shows  several  distribution  trees  created  by  this  algorithm  for  networks  of  various  sizes  and 
various  birthnodes.  This  method  creates  trees  with  depth  at  most  log3  m  +  1  for  a  network 
with  a  maximum  dimension  of  m  nodes.  So,  for  a  reasonable  sized  machine  of  4096  nodes 
(64  x  64)  we  would  at  most  have  to  traverse  tog3  64  +  1  or  7  edges  of  the  distribution  tree. 
For  enormous  systems,  say  IK  nodes  on  a  side,  the  tree  depth  will  be  only  11. 
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Chapter  6 


System  Support  of  a 
Type-Dispatched  Calling  Model 


We  never  sent  a  messenger  save  with 
the  language  of  his  folk,  that  he 
might  make  the  message  clear  for  them 

—  The  Koran,  IS:  11 


One  of  the  most  important  aims  of  the  Jellybean  Machine  is  to  provide  a  concurrent 
processor  that  efficiently  supports  object-oriented,  late-binding  procedure  activations.  This 
chapter  introduces  the  idea  of  message-passing  and  late-binding  programming  methodolo¬ 
gies,  and  discusses  the  system  services  in  the  Jellybean  Machine  operating  system  that 
support  this  manner  of  programming. 

6.1  Message- Passing  and  Object-Oriented  Languages 

i 

There  has  been  much  interest  during  the  past  few  years  in  “object-oriented”  programming. 
Though  this  term  is  not  particularly  precise,  it  does  describe  a  fairly  cohesive  set  of  languages 
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exhibiting  behavior  markedly  different  from  the  typical  Algol-like  programming  style.  There 
are  two  characteristics  in  particular  that  languages  typically  categorized  as  object-oriented 
share. 

First  of  all,  operations  tend  not  to  be  thought  of  as  functions  applied  to  data  objects, 
as  they  are  in  Algol  derivatives.  Instead,  data  objects  are  “personified”  as  “actors”  that 
receive  requests  made  of  them.  These  requests  are  made  by  “sending  a  message”  to  an 
object  called  the  receiver  of  the  message.  The  operation  that  was  requested  of  the  object 
is  typically  called  the  selector,  since  it  selects  the  object  to  be  performed.  So,  where  a 
standard  language  Algol-like  language  might  calculate  the  determinant  of  a  matrix  m  by 

determinant  (n) ; 

and  object  oriented  implementation  might  look  something  like 

(send  a  ’determinant) 

We  call  this  concept  of  performing  operations  by  sending  selectors  to  objects  the  message¬ 
passing  paradigm.  This  paradigm  turns  out  to  be  a  very  convenient  model  of  computation. 

The  second  characteristic  of  object-oriented  languages  that  make  them  appealing  is 
the  fact  that  the  operations  on  different  data-types  can  have  the  same  names.  Th:*  'llows 
us,  for  example,  to  have  an  ’area  selector  for  circle  data  types,  as  well  as  an  'area  selector  for 
polygon  data  types.  In  many  other  languages  this  would  cause  a  naming  conflict,  requiring 
us  to  set  up  an  explicit  naming  convention,  such  as  calling  circleorea()  and  polygon _area() 
routines  oil  objects  of  the  proper  type. 

But,  mote  importantly  than  just  saving  us  the  hassle  of  naming  conflicts,  object- 
oriented  languages  actually  decide  which  procedure  to  run  for  a  certain  data  type.  In  other 
words,  when  an  ’area  selector  arrived  at  an  object,  the  system  would  decide  whether  this 
object  is  a  circle  or  a  polygon  and  automatically  run  the  correct  procedure.  In  addition, 
if  the  receiver  of  the  ’area  selector  was  not  a  data  type  that  supported  the  area  operation 
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(such  as  an  integer),  then  an  error  would  be  reported  by  the  system.  In  Algol-like  languages, 
it  is  the  burden  of  the  programmer  to  know  the  type  of  the  object  he  is  dealing  with,  so  he 
can  call  the  proper  operation.  This  is  crucial  in  many  symbolic  languages  with  loose  type¬ 
checking,  like  Lisp,  where  we  can  have  lists  of  many  different  types  of  objects1.  This  is  called 
a  late-binding  activation  since  we  don’t  decide  what  routine  will  be  run  at  compile- time, 
but  instead  wait  until  later,  when  the  message  send  is  actually  done. 

Operations  with  the  same  name  and  semantically  similar  meaning  supported  by 
various  data  types  are  called  generic  operations  since  these  operations  represent  the  generic 
behavior  the  programmer  wants  to  accomplish  (add  things,  draw  things,  calculate  areas  of 
things).  The  specific  behavior  is  calculated  at  run-time  once  we  know  the  data  type  of  the 
object  (called  the  class  of  the  object),  and  the  selected  operation,  by  a  process  known  as 
class-selector  lookup. 

So,  object-oriented  languages  have  two  main  components 

1.  Procedures  are  activated  by  the  message-passing  paradigm  rather  than  a  more  ap¬ 
plicative  model  of  programming. 

2.  Each  data  type  has  its  own  set  of  supported  operations,  where  names  can  be  the  same 
as  in  other  data  types,  and  may  represent  generic  operations  over  varied  data  types. 
Activations  are  caused  by  late-binding  sends  which  lookup  the  specific  operation  to  run 
based  on  the  class  of  the  object  receiving  the  message  (the  receiver)  and  the  selected 
operation  (the  selector). 


Our  goal  now  is  to  provide  a  system  substrate  that  will  efficiently  and  conveniently  support 
these  aims. 

'A  good  example  of  this  is  aa  object  oriented  drawing  program,  where  we  hare  a  list  of  many  different 
types  objects  that  are  in  the  current  picture.  A  convenient  way  to  refresh  the  screen  tn  an  object-oriented 
system  is  tonend  a  ‘draw  mem  age  to  each  object  m  the  list.  Baited  on  the  data  type  of  each  object  at 


run-time,  the  appropriate  routine  (drde  draw,  rectangle  draw,  text  draw,  etc.)  is  activated 
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Figure  6.1:  Format  of  the  SEND  Message 


6.2  Late-Binding  Send  Execution  Support 

The  next  task  of  the  operating  system  is  to  provide  a  mechanism  to  simulate  the  message- 
passing  paradigm.  We  already  have  network  communication  hardware  that  allows  data  to 
be  sent  between  nodes.  We  also  have  a  global  object  namespace  provided  by  the  virtual 
memory  extensions.  Together,  we  can  use  these  components  to  implement  the  message¬ 
passing  execution  model. 

To  do  this,  we  implement  one  more  primitive  message,  the  SEND  message  handler 
(not  to  be  confused  with  the  SEND  machine  instruction).  This  primitive  message  handler 
acts  in  the  object-oriented  we  showed  earlier.  Figure  6.1  shows  the  significance  of 

the  different  words  of  the  message.  The  first  word  is  the  address  of  the  SEND  message 
handler,  the  second  word  is  the  selector,  the  third  word  is  the  receiver.  The  rest  of  the 
words  are  arguments,  and  information  about  where  to  reply  to. 

When  the  SEND  message  arrives  on  the  node  that  the  receiver  resides  on  (we  for¬ 
ward  this  SEND  message  to  wherever  the  receiver  resides)  the  primitive  message  handler  is 
started.  Figure  6.2  shows  a  flow  chart  that  describes  how  the  SEND  message  handler  works. 
It  first  picks  the  class  our  of  the  receiver  object  (so  we  know  what  data  type  the  receiver  is). 
We  then  merge  the  class  and  selector  together  into  a  class/selector  word  (shown  in  figure 
6.3).  Now  that  we  have  the  class  and  selector,  we  try  to  see  if  there  is  a  class/selector  -* 
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method  ID  binding  in  the  cache.  If  so,  we  start  the  method  with  the  CALL  message  as 
discussed  in  the  previous  chapter.  If  not,  we  need  to  lookup  the  binding. 

At  the  current  time,  we  do ‘not  have  enough  insight  into  the  characteristics  of  ma¬ 
chine  behavior,  to  feel  comfortable  locking  down  the  class/selector  lookup  algorithm.  For 
this  reason,  we  provide  the  lookup  routine  in  a  method.  We  insist  that  this  method  is  allo¬ 
cated  before  any  others  so  it  always  has  the  same  method  ID.  This  LookupMethod  method 
takes  the  class  and  selector,  and  consults  some  distributed  system  table  to  find  the  method 
ID  corresponding  to  this  class  and  selector. 


6.3  Loading  Class/ Selector  Methods  into  the  System 

Let’s  now  briefly  look  at  how  the  class/selector  method  information  is  loaded  into  the  Jelly¬ 
bean  system.  Figure  6.4  shows  the  schema  for  how  the  compiler  and  run-time  environment 
will  interact  with  the  Jellybean  Machine  processing  network.  The  compiler  is  responsible 
for  generating  class  and  selector  numbers  and  for  compiling  the  source  language  into  MDP 
machine  code.  A  certain  node  of  the  network  is  picked  for  the  method  to  reside  on  by  some 
distribution  policy.  The  method  data  as  well  as  the  class  and  selector  that  this  method 
represents  are  sent  to  this  chosen  node  by  the  NEW-METHOD  message.  The  format  of  a 
NEWJdETHOD  message  is  shown  in  figure  6.5. 

When  a  NEW .METHOD  message  arrives  at  a  node,  the  NEWJMETHOD  message 
handler  begins  executing.  It  makes  an  object  to  hold  the  method,  and  copies  the  code  from 
the  message  into  the  object.  The  NEW .METHOD  handler  then  calls  the  InstallMethod 
method  which  takes  the  class,  selector,  and  method  ID  and  makes  the  bindings  in  the 
class/selector  -*  method  ID  data  structures. 

Specification  of  the  class/selector  -*  method  ID  data  structures  has  been  ignored 
without  attempts  at  subtlety.  We  do  not  have  enough  insight  to  definitely  specify  the  best 
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Figure  6.2:  Flowchart  of  the  SEND  Message  Handler 
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16  bits  16  bite 


Figure  6.3:  Gaw/Selector  Word  Format 


Figure  6.4:  A  Coarse  View  of  the  Compiler/Machiue  Interface 


CHAPTER  6.  SYSTEM  SUPPORT  OF  A  TYPE-DISPATCHED  CALLING  MODEL  63 


HEW 

METHOD 

Routine 

Ciess 

Selector 

Length 
of  Lode 

Code  )  ) 

Address 

—  /  z_ 

Figure  6.5:  Format  of  the  NEW .METHOD  Message 


format  for  these  tables.  We  can  talk  a  bit  about  the  issues  involved.  (1)  We  should  be 
able  to  take  a  class/selector  word  and  efficiently  find  the  corresponding  method  ID.  (2)  The 
table  should  be  distributed  around  the  network  in  a  way  to  minimise  bottlenecks. 

A  reasonable  way  of  doing  this  would  be  to  apply  some  “bit-twiddling"  function 
to  the  clasa/selector  words  to  decide  what  node  is  responsible  for  knowing  their  bindings. 
The  actual  data  structures  could  be  hashed,  or  perhaps  each  class  would  have  an  object 
that  holds  the  method  IDs  for  every  selector.  One  annoying  problem  with  any  approach 
is  the  boot-strapping  problem.  We  need  to  know  how  we  can  get  to  the  data.  Because  of 
the  added  indirection  through  the  LookupMethod  and  InstallMethod  handlers  we  have  the 
flexibility  to  try  several  approaches  and  test  their  performance  in  the  future. 


6.4  Returning  Values 

Return  values  can  be  sent  with  the  REPLY  message.  This  message  takes  the  context  ID 
to  reply  to,  the  slot  number  of  the  context  to  All,  and  one  word  of  reply  data.  The  reply 
data  is  passed  by  value  if  it  is  a  primitive  data  word,  or  by  reference  if  an  object  is  to  be 
returned. 
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6.5  Summary 


The  class/selector  calling  model  is  a  convenient  mechanism  for  invoking  tasks.  By  imple¬ 
menting  it  efficiently  in  the  operating  system  kernel,  we  can  guarantee  an  efficient  implemen¬ 
tation.  To  provided  extensibility,  we  provide  hooks  to  the  LookupMethod  and  InsertMethod 
handlers,  so  these  routines  can  be  reconfigured  independently  of  the  rest  of  the  kernel. 


Chapter  7 


Storage  Reclamation  in  the 
Jellybean  Machine 


But  virtue,  as  it  never  will  be  moved, 
Though  lewdness  court  it  in  a  shape  of  heaven, 
So  lust,  though  to  a  radiant  angel  linked, 
Will  sate  itself  in  a  celestial  bed, 
And  prey  on  garbage 


—  Shakespeare,  in  Hamlet  I,  V.  53 


7.1  Introduction 

The  successful  performance  of  our  machine  relies  on  the  fact  that  sufficient  parallelism 
exists  on  the  grain  of  methods.  In  order  for  this  to  happen,  it  is  important  that  data- 

I 

dependencies  to  shared  objects  are  minimized,  by  adopting  a  more  functional  approach, 
where  methods  interact  by  value  rather  than  by  reference,  as  much  as  possible.  This  situa¬ 
tion  promotes  a  large  number  of  small,  short-lived  objects.  Because  of  the  minute  amount 
of  memory  per  each  processing  node,  an  efficient  storage  reclamation  mechanism  becomes 
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an  important  facet.  The  characteristics  of  our  system,  however,  cause  many  straightfor¬ 
ward  methods  of  storage  management  to  break  down.  In  this  discussion  we  will  examine 
some  of  the  important  properties  of  the  Jellybean  Machine,  and  the  ways  these  properties 
influence  reclamation.  The  rest  of  this  chapter  provides  a  discussion  of  the  issues  pertaining 
to  reclamation  on  the  Jellybean  Machine,  and  a  possible  first-cut  at  a  garbage  collection 
algorithm. 

7.2  Automatic  Collection  is  Desirable 

Because  the  system  is  object  oriented,  and  because  we  have  a  small  memory  with  frequent 
allocations,  object  reclamation  is  important.  Because  objects  can  be  shared  in  complex 
ways,  and  because  of  the  high  level  programming  model  we  wish  to  support,  we  wish  most 
object  deallocations  to  be  handled  automatically  by  a  “garbage  collector”  that  searches  for 
objects  that  are  no  longer  in  use  (i.e.  there  are  no  pointers  to  the  object  anywhere)  and 
deallocates  them  when  necessary. 


7.3  Choosing  a  Collection  Approach 

Several  characteristics  of  the  Jellybean  Machine  will  guide  ns  in  the  choice  of  garbage 
collection.  Let’s  remind  ourselves  of  the  character  of  the  machine. 

7.3.1  Memory  Organization 

The  memory  in  a  Jellybean  processor  is  small,  and  it  is  local  to  that  processor.  Memory 
allocation  is  done  in  a  simple  contiguous  manner.  Compaction  can  be  done  ia  parallel 
very  quickly.  Memory  objects  arc  segment-based  and  are  given  unique  object  id’s.  In 
addition,  these  object  id’s  are  concatenated  with  a  birth  node  number  to  provide  a  global 
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virtual  address.  The  virtual  to  physical  translation  mechanism  uses  caching  to  improve 
name  resolution,  but  this  relies  on  locality.  Random  access  to  many  addresses  could  be 
very  expensive. 

7.3.2  Addressing  System  and  Network  Topology 

The  Jellybean  Machine  uses  a  distributed  memory  to  provide  “site  autonomy”  [LS80]  in 
order  to  perform  local  operations  very  fast,  and  avoid  memory  conflicts.  But,  the  tradeoff  is 
that  foreign  accesses  will  be  very  costly,  involving  a  message  send  mechanism  that  is  at  least 
an  order  of  magnitude  slower.  In  addition,  distributed  memory  can  require  synchronization, 
and  the  delays  of  network  communication  may  make  certain  synchronization  conditions 
impossible.  The  network  may  cause  bottlenecks  to  occur  if  too  many  messages  are  sent  to 
one  place,  and  may  hold  data  in  transit.  The  network  latency  may  also  be  a  factor. 

7.3.3  Garbage  Collection  Character 

Garbage  collectors  take  on  various  different  characters.  The  common  approach  of  reference 
counting  collection  doesn’t  appear  to  be  feasable  in  the  Jellybean  Machine  because  (1) 
it  cannot  collect  cyclic  data  structures,  (2)  every  pointer  change  will  require  a  (possibly 
remote)  object  access,  and  (3)  we  are  not  always  aware  when  “dead”  pointers  get  changed. 
For  these  reasons,  we  decided  to  attempt  some  variant  of  a  pointer  chasing  garbage  collection 
mechanism.  The  next  section  describes  the  implementation  of  a  pointer  chasing  garbage 
collector  for  our  machine  in  some  detail. 


7.4  A  Pointer  Chasing  Garbage  Collector 

There  are  several  properties  that  we  would  like  our  garbage  collector  to  have. 
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•  The  collector  should  be  efficient  in  terms  of  time  end  message  sends.  We  do  not  want 
the  queues  of  all  nodes  to  overflow  with  collection  messages. 

•  The  collector  should  run  in  the  background  or  incrementally,  for  two  reasons.  First, 
we  wish  to  take  advantage  of  processor  idle  time  so  that  we  can  squeeze  as  much 
computation  out  of  our  processor  as  possible.  Secondly,  we  would  like  to  avoid  the 
situation  where  our  machine  runs  for  a  while  and  then  “hangs  up”  for  an  hour  while 
garbage  collection  occurs. 


7.4.1  The  General  Idea 


Most  of  the  work  of  pointer  chasing  garbage  collection  algorithms  to  date  are  targeted  at 
sequential  or  shared-memory  machines  with  large  virtual  memories.  The  standard  algo¬ 
rithm  is  based  on  the  copying  collector  proposed  by  Baker.  This  has  been  expanded  into 
incremental  collectors  and  has  been  tuned  to  various  object  lifespans,  with  a  good  degree 
of  success.  Still,  these  approaches  are  targeted  at  a  genre  of  machine  of  a  radically  differ¬ 
ent  character  that  the  J- Machine.  With  an  admitted  scarcity  of  knowledge  in  distributed 
collection,  the  rest  of  this  chapter  serves  only  to  sketch  a  simple  vision  of  such  a  collector 
[Tot88],  and  some  of  the  problems  that  are  faced. 

A  simple  collector  would  involve  recursive  marking  by  message  sends,  and  would 
compact  the  heap  rather  than  by  scavenging  or  copying,  due  to  the  small  amount  of  memory 
per  chip.  The  phases  of  this  simple  collector  would  be: 

Desire  The  desire  phase  occurs  wbsn  some  node  or  nodes  has  a  desire  to  garbage  collect. 
Perhaps  a  node  or  a  certain  number  of  nodes  have  run  out  of  memory.  Perhaps  this 
occurs  on  a  time  count. 

Init  The  initialization  phase  is  where  objects  are  marked  unreferenced  initially,  as  well  as 
setting  any  necessary  variables. 

Marking  The  iparkiag  phase  does  a  recursive  descent  of  the  reference  tree  starting  at  the  root 
set,  marking  reachable  objects  with  the  reachable  tag. 

Sweeping  When  marine  is  done,  the  memory  can  be  compacted  by  “sweeping”  the  good  objects 
back  toward  the  bottom  of  the  heap,  and  changing  their  virtual  -*  physical  bindings. 
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7.4.2  Problems 

Synchronization  and  “Travelling  References’* 

A  major  problem  in  garbage  collection  across  a  communication  medium  is  lack  of  synchro¬ 
nized,  instantaneous  transmission.  This  shows  itself  in  garbage  collection  in  a  few  ways. 
One  of  the  more  annoying  problems  is  how  to  be  sure  that  the  last  pointer  to  an  object 
isn’t  in  transit  when  the  garbage  collector  comes  along.  The  garbage  collector  doesn’t  see 
any  pointers  in  the  network,  so  an  object  may  be  deleted  because  a  pointer  was  “travelling” 
between  nodes  where  it  can’t  be  noticed.  We  can  refer  to  this  as  the  travelling  reference 
problem.  Figure  7.1  shows  a  portion  of  a  network  of  processors,  where  an  ID  of  an  object 
is  in  the  network  when  the  collector  is  run. 

An  obvious  way  to  resolve  this  situation  is  to  prevent  all  upcoming  message  sends 
during  collection,  so  that  no  other  pointers  are  mailed  into  the  network,  and  then  to  wait 
until  all  messages  in  transit  have  landed  in  a  queue.  We  can  tell  when  all  messages  have 
landed  by  either  waiting  a  length  of  time  we  know  to  be  longer  than  the  maximum  latency 
from  the  most  distant  nodes,  or  by  sending  “scout”  or  “bulldozer”  messages  down  the 
network  dimensions.  When  all  these  “bulldozer”  messages  arrive,  they  will  have  pushed  all 
other  messages  out  of  the  way,  and  the  network  will  be  empty. 

Problems  With  Disabling  Sends 

In  order  to  prevent  the  travelling  reference  problem,  we  have  to 

e  Disable  sends  so  no  new  references  enter  the  network. 

•  Wait  for  all  messages  in  the  message  in  the  network  to  land. 

But,  we  have  no  explicit  mechanism  in  the  MDP  processing  node  to  disable  sends1.  If  we 

did,  we  could  allow  the  processors  to  run  until  they  tried  to  execute  one  of  these  disabled 
'Or  more  preferably  -  a  mechanism  that  would  disable  any  sends  that  would  cause  a  reference  to  be 
mailed  into  the  network  -  all  other  messages  could  continue 
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Figure  7.1:  Object  ID  Travelling  in  Network 
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instructions.  When  this  happened,  a  fault  conld  occur  and  some  manner  of  process  halting 
could  occur  (such  as  saving  a  context  for  the  process  for  later  re-starting3). 

A  possible  way  to  resolve  this  problem  at  first  might  be  to  place  guards  in  certain 
high-level  execution  handlers  such  as  SEND  and  CALL.  These  handlers  are  run  when  a 
SEND  or  CALL  message  (two  messages  that  ask  a  node  to  start  executing  a  method) 
arrives.  Inside  these  handlers  we  could  have  a  guard  that  would  defer  the  execution  of 
the  method  until  collection  finishes.  This  goes  a  long  way  toward  resolving  the  problem  of 
travelling  references  if  most  the  code  that  mails  IDs  around  is  code  that  is  executed  with 
CALL  and  SEND3 

Another  way  to  shut  down  the  machine  might  be  to  disable  the  queue  execution. 
This  would  cause  messages  to  back-up  in  the  queues.  Certain  messages  that  we  would  want 
to  execute  could  be  done  by  having  the  processor  “walking"  the  queue  by  hand  looking  for 
certain  types  of  messages  (such  as  garbage  collection  messages).  It  could  also  pull  items 
out  of  the  queue  and  into  the  heap  to  prevent  queue  overflow. 


Problems  With  Background  Execution 


Since,  at  the  start  of  garbage  collection,  we  stop  message  sends  by  various  possible  mech¬ 
anisms,  our  concurrent  machine  is  effectively  shut  down.  This  violates  our  desire  for  the 


collector  to  run  in  the  background,  in  parallel  with  method  execution. 

’TUa  however,  co«U  Wad  to  the  difficult  to  tumle*,  problem  of  insufficient  memory  for  *  cob  text  alloc*- 
tioa.  Tlus  aaightb*  liksly  since  wa  are  ia  the  middle  of  collection.  When  there  is  sot  enough  local  memory, 
the  itinrlsrrl  msrhisieint*  to  do  the  allocation  on  a  foreign  node.  Bat  this  requires  mailing  reference*  in  the 


P^‘  -  Providing 

efficient,  convenient  methoda  of  prevent  travelling  references 

*Aad  thia  is  likaiy  to  be  true.  Apart  from  CALL  and  SEND  messages,  all  other  messages  are  primitive 
system  messages  (where  the  system  may  have  to  be  responsible  for  avoiding  ID  matting  during  collection), 
and  various  other  messages  to  create  NEW  objects  and  handle  function  returns.  If  we  think  of  a  CALL 
or  a  SEND  ss  being  ™ function  call,  then  this  guard  method  will  eventually  stop  the  machine,  with  every 
processor  being  idle  or  waiting  to  execute  a  function.  This  implementation  has  at  least  3  requirement*  that 
we  must  always  be  aware  of.  (1)  We  must  insure  that  all  aoa-CALL  and  aon-SEND  messages  must  not 
violate  the  rules  and  mail  references  during  garbage  coUectioa  time.  (2)  Catastrophe  can  occur  when  we  run 

out  of  memory  trying  to  make  context*  to  hold  the  deferred  execution  requests. 
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la  addition,  the  lack  of  a  register  set  for  background  mode  prevents  any  way  for  the 
Message  Driven  Processor  to  take  advantage  of  idle  time  in  a  reasonable  way.  Since  any 
message  would  take  priority  over  background  mode,  the  register  set  will  be  trashed.  Any 
computation  done  in  background  mode  must  shut  off  interrupts,  which  instead  of  taking 
advantage  of  idle  time,  takes  advantage  of  application  execution  time!  Some  compromises 
can  be  made,  such  as  having  background  mode  start  up  small  units  of  computation  by  send¬ 
ing  priority  P  messages,  or  by  queuing  up  contexts  of  waiting-to-run  background  processes 
that  are  begun  by  a  context  startup  message  send  when  the  background  loop  is  entered. 
Again,  various  improvements  should  be  examined. 

7.5  Summary 

The  characteristics  of  the  Jellybean  machine  necessitate  aheap  collector  to  reclaim  storage. 
This  collector  may  have  to  run  often  (since  our  nodes  have  such  a  small  amount  of  memory). 
A  reference  counting  approach  seems  to  be  out  since  there  is  a  large  overhead  in  changing 
the  object  reference  counts  (and  it  is  difficult  to  know  when  a  reference  is  written  over 
and  thus  deleted)  as  well  as  the  fact  that  it  cannot  handle  cyclic  structures  (if  we  insist 
that  cyclic  structures  are  illegal  that  rerafts  in  a  big  loss  in  teems  of  flexibility.  If  we  don’t 
collect  structures,  we  will  rapidly  run  out  of  memory).  A  pointer  chasing  collector  has 
problems  with  travelling  references  (where  the  marker  will  not  see  the  final  reference  to 
an  object  because  it  is  in  a  network  -  and  thus  delete  tile  object),  but  seems  to  be  the 
most  viable  approach.  It  would  be  desirable  to  have  the  collector  run  in  the  background 
without  shut+:ug  the  machine  down,  but  the  travelling  reference  problem  seems  to  make 
this  difficult. 


Chapter  8 

Support  for  Concurrent 
Programming  Languages 

I  get  by  with  a  little  help  from  my  friends. 
—  John  Lennon  and  Paul  McCartney,  in  “A  Little  Help  From  My  Friends’'  f  1967) 

The  Jellybean  Machine  Operating  System  Software  provides  several  noteworthy 
services  to  support  concurrent  programming  languages,  both  for  functional  and  efficiency 
reasons.  These  include  (1)  the  SEND  and  REPLY  message  handlers,  (2)  futures,  (3)  dis¬ 
tributed  objects,  and  (4)  the  interaction  interface. 


8.1  High-Level  Languages 

8.1.1  CST 

Currently,  the  high-level  language  being  used  in  the  Jellybean  Machine  project  is  a  Smalltalk- 
80  based  language  called  CST  (Concurrent  SmallTalk)  [DC].  CST  uses  a  Lisp-like  pre- 
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fix  syntax,  and  co Am  send*  implicitly  in  a  function  application  metaphor.  CSX  allows 
asynchronous  messages  to  exploit  concurrency,  and  folly  utilises  the  .late- binding  execution 
model.  Locks  are  provided  for  explicit  synchronization,  and  a  “distributed  object”  data 
type  exists  to  scatter  object  state  *  over,  a  large  area.  This  CSX  code  will  be.  compiled  to 
intermediate  code  which  will  is  passed  through,  a  back  end  that  converts  the  i-code  to  MOP. 
machine  code  and  loads  it  into  the  system.  The  compilation  and  loading,  mechanism  is  was 
previously  sketched  in  figure  6.4. 

The  rest  of  this  chapter  deecribes  &evaral  operating  system  services  that  support  the 
execution  of  the  object-oriented  model  of  computation. 

8.2  SEND  and  REPLY 

As  discussed  in  earlier  chapters,  the  SEND  message  handier  provides,  the  machinery  to  run 
a  method  based  on  the  class  of  a  receiving  object  and  the  selector  symbol  “sent”  to  the 
object.  In  the  current  system,  the  SEND  message  may  also  describe  one  object  to  return  a 
value  to.  This  retum-alot  is  specified  by  passing  the  ID  of  the  object  to  hold  the  returned 
value  (the  returned  value  must  be  one  word,  either  a  primitive  value  such  as  an  integer  or 
a  symbol,  or  the  ID  pointer  to  the  object),  the  slot  (index  into  the  object)  number,  and  thp 
node  the  object  is  on. 

The  REPLY  handler  actually  performs  the  return  of  the  value.  The  REPLY  message 
mails  the  target  object  ID,  the  target  variable  number,  and  the. one  word  return  value  to  the 
node  number  specified  in  the  SEND  message.  When  a  REPLY  message  arrives  at  a  node, 
the  returned  '  due  is  stored  in  the  indicated  slot  of  the  target  object,  and  any  processes 
waiting  for  a  variable  to  be  filled  by  a  reply  are  restarted. 
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8.3  Futures 

8.3.1  Conforming  to  Data  Dependencies 

Data  dependencies  impose  an  order  on  execution.  If  a  computation  result  is  used  in  a 
calculation,  the  result  must  be  available  before  the  calculation  can  occur.  In  a  sequential 
processor,  there  is  no  problem.  The  instructions  are  ordered  in  such  a  way  to  insure  that 
previous  results  are  available  in  certain  places  before  those  values  are  needed.  In  a  dis¬ 
tributed  processor,  on  the  other  hand,  a  computation  may  take  an  indeterminate  amount 
of  time  to  complete  on  a  remote  node.  Because  of  this,  we  may  get  to  a  point  where  a  value 
is  needed  before  the  calculation  of  the  value  has  completed.  It  is  necessary  to  wait  until 
this  result  returns  before  continuing  the  calculation. 

8.3.2  The  Check’s  in  the  Mail 

This  section  details  a  mechanism  used  prominently  by  the  Jellybean  Machine  to  impose  data 
dependency  orderings  conveniently.  The  mechanism  is  quite  simple.  Whenever  a  calculation 
is  spawned  off  in  parallel,  the  destination  location  where  the  value  of  the  calculation  is  to 
be  stored  is  filled  with  a  specially  tagged  value,  called  a  context  future ,  indicating  that  the 
value  will  arrive  to  the  context  in  the  future.  When  the  calculation  replies  with  the  value, 
the  future  is  overwritten  with  the  real  value  of  the  computation. 

When  an  access  is  made  to  a  location  in  a  context,  using  the  value  located  there, 
there  is  the  possibility  that  the  value  hasn’t  replied  yet.  We  can  tell  if  the  value  hasn’t 
returned  yet,  because  it  will  be  filled  with  a  context  future  (c-future)  if  it  hasn’t.  Any  read 
of  a  location  containing  a  c-future  will  cause  the  processor  to  fault,  (1)  saving  the  processor 
state  in  the  context  object  and  (2)  marking  the  context  as  waiting  for  a  c-future.  When  a 
reply  arrives  to  a  context,  the  context  is  checked  to  see  if  it  is  waiting  on  a  c-future.  If  so, 
it  is  queued  to  be  restarted. 
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Advantages 

Disadvantages 

Simple 

Transparent 

Minimal  Synchronization 

Large  Inertia 

Parallelism  Wasted 

False  Restarts 

Table  8.1:  Pros  and  Cons  of  Dependency  Enforcement  by  Futures 


Let’s  examine  this  context-future  mechanism  in  a  bit  more  detail  to  see  what  it 
really  provides  us  and  what  deficiencies  it  faces.  Table  8.1  itemizes  some  of  the  advantages 
and  disadvantages  of  the  future  mechanism. 

8.3.3  Advantages 

As  we  said  earlier,  the  most  desirable  characteristics  of  the  c-future  approach  is  that  it  is 
simple  to  implement  and  understand.  It  fits  well  into  the  existing  system,  being  “opti¬ 
mistic”  —  taking  advantage  of  the  fault  mechanism  and  the  tagged  architecture  and  using 
contexts. 

Being  transparent  to  the  programmer /compiler  writer  is  desirable  as  well.  No 
burden  is  placed  on  the  code  generator  to  explicitly  keep  track  of  non-completed  tasks. 
No  extra  instructions  need  to  be  placed  in-line  to  check  for  the  presence  of  values,  or  to 
manipulata  semaphores. 

Final*/,  the  future  approach  only  pays  the  price  of  synchronisation  if  it  is  neces¬ 
sary.  If  a  value  returns  before  it  is  needed,  or  if  an  arm  of  a  conditional  is  never  executed, 
we  will  not  need  to  pay  the  synchronization  price1. 

’Though  we  do  require  all  replies  to  be  in  before  we  deallocate  a  context,  so  we  can  re-use  context  IDs. 


CHAPTER  8.  SUPPORT  FOR  CONCURRENT  PROGRAMMING  LANGUAGES  77 


8.3.4  Disadvantages 

On  the  other  hand  there  are  several  disadvantages  to  this  approach.  The  system  is  subject 
to  high  inertia.  The  total  cost  of  halting  and  saving  a  context  and  restarting  it  when 
the  return  value  arrives  is  relatively  high.  The  worst  case  occurs  when  we  have  many 
dependencies  following  one  after  another.  Here,  we  would  keep  halting  and  restarting, 
making  very  little  progress.  It  can  be  difficult  to  gain  any  momentum,  because  of  the  time 
spent  saving  and  restarting  contexts.  This  case  isn’t  quite  so  bad  if  we  have  other  tasks 
queued  up  that  can  take  advantage  of  the  free  time,  and  if  the  replies  take  a  while  to 
arrive  (which  is  likely  to  be  the  normal  case).  The  real  question  is  one  of  balance  between 
computation  time  and  system  overhead  time. 

By  controlling  execution  on  the  grain  size  of  methods,  whenever  a  sequential  exe¬ 
cution  encounters  a  c-future  value,  the  entire  method  will  be  suspended.  Thus  once  we  hit 
a  c-future  value,  other  possibly  executable  code  in  the  method  is  not  run.  This  is  directly 
the  result  of  basing  the  grain  of  parallelism  on  the  unit  of  methods,  and  it  has  the  effect  or 
wasting  parallelism  as  oppoeed  to  a  more  fine-grain  execution  model. 

C-futures  also  can  lead  to  a  problem  of  false  restarts  where  a  reply  for  a  different 
slot  would  restart  the  context,  which  would  immediately  halt  on  the  same  c-future  again. 
If  we  were  waiting  on  variable  A  to  return  and  a  reply  to  fill  variable  B  arrives,  the  context 
would  be  restarted  falsely,  and  when  we  read  A  we  will  hit  the  same  future  and  halt  again. 
This  is  rectified  in  the  prototype  implementation,  by  using  the  RESOURCE-NEEDED  slot 
of  the  context  to  hold  the  slot  number  the  context  need  to  be  filled.  When  a  REPLY  arrives, 
the  context  is  only  restarted  if  it  was  waiting  on  the  slot  the  REPLY  came  to  fill. 
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8.4  Distributed  Objects 

A  final  system  characteristic  designed  to  support  efficient  high-level  language  execution  is 
the  introduction  of  distributed  objects.  A  distributed  object  is  one  where  its  state  is  broken 
up  into  segments  called  constituent  objects ,  and  scatterred  across  the  processing  network. 
Its  purpose  is  to  allow  parallel  access  to  different  parts  of  an  object. 

A  single  object  can  only  be  directly  accessed  by  the  node  it  resides  on,  and  the  node 
it  resides  on  can  only  run  one  task,  implying  that  an  object  can  only  be  computed  on  by 
one  task  at  a  time.  In  the  absence  of  coherent  caching  strategies,  this  one-object — one-task 
constraint  can  potentially  severely  limit  parallelism. 

By  distributing  parts  of  the  object  over  several  nodes  we  can  provide  some  extra 
(albeit  limited)  concurrency.  The  hope  is  that  this  increase  of  concurrency  along  with  the 
fact  that  an  object-oriented  programming  model  should  provide  access  to  many  distinct 
objects  being  computed  on  at  once  will  prevent  object  bottlenecks  from  becoming  a  serious 
performance  hindrance. 

The  system  supports  distributed  objects  by  providing  (1)  allocation  and  (2)  con¬ 
stituent  lookup  services.  When  a  distributed  object  is  allocated,  the  system  creates  con¬ 
stituent  objects  and  scatters  them  in  a  reasonable  way  around  the  network.  Each  :or.stituent 
object  has  a  normal  object  ID  number  which  is  unique  for  each  CO,  and  a  distributed  ID  or 
DID  which  is  the  same  for  all  constituents  of  a  distributed  object.  This  DID  contains  the 
information  necessary  to  locate  any  constituent  object. 

8.4.1  A  distributed  ID  Format 

Figure  8.1  shows  a  possible  format  for  a  distributed  ID.  The  DID  knows  the  number  of 
constituent  objects,  the  hometown  node  of  the  first  object,  and  a  node- uni  -;:e  serial  num¬ 
ber.  This  prototype  DID  format  places  a  limit  of  256  COs  per  distributed  object  and  256 
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8-Bit*  16-Bit*  8-Bit* 


TAG 

NUMBER 

or 

HOMETOWN-NODE 

SERIAL 

CONSTITUENT 

("ROOT") 

NUMBER 

OBJECTS 

Figure  8.1:  Distributed  ID  Format 


distributed  objects  per  node. 

8.4.2  Dealing  out  the  Constituent  Objects 

When  a  distributed  object  is  allocated,  we  want  to  have  a  function  that  maps  each  con¬ 
stituent  object  to  a  node  number.  This  function  should  have  several  properties.  It  should 
be  (1)  easy  to  compute,  it  should  (2)  scatter  objects  in  an  acceptable  manner. 

The  goal  of  distribution  is  to  provide  concurrency,  so  with  this  aim  as  the  measure  of 
success,  any  distribution  scheme  would  be  equivalent.  But,  we  need  to  take  into  account  how 
the  processor  load  is  distributed  around  the  network  as  well.  There  are  two  dichotomous 
goals  of  constituent  distribution,  (1)  to  scatter  the  objects  uniformly  across  the  network  so 
there  are  no  hotspots  and  (2)  to  scatter  the  objects  locally  to  prevent  long  distance  network 
traffic. 

Dispersion  or  Locality? 

These  seemingly  contradictory  aims  argue  against  each  other.  If  we  scatter  objects  uni¬ 
formly,  especially  if  there  are  very  few  objects,  the  data  may  lie  very  far  away  from  the 
majority  of  the  computation.  Even  though  some  of  the  computation  will  migrate  near  the 
data  and  spawn  from  there,  there  still  many  be  a  great  deal  of  network  traffic  caused  by 
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stride 

noden 


I  nodes  I 
L  constituents  J 

(birthnode  +  n  x  stride)  mod  nodes 


Figure  8.2:  Distribution  of  Constituent  Objects 


the  processes  still  proceeding  from  the  root  of  the  computation.  In  time,  migration  of  work 
may  balance  the  load  appropriately,  but  we  still  have  worries  about  uniform  distribution. 

On  the  other  hand,  if  we  clump  the  constituent  objects  close  together,  the  computa¬ 
tion  will  cluster  around  the  data,  and  not  hinder  the  performance  of  the  rest  of  the  network 
via  long  distance  traffic,  but  this  local  hotspot  may  overwhelm  the  computational  resources 
of  this  local  area  of  processors. 

A  Simple  Dispersal  Approach 

The  first  design  of  the  distributed  object  system  leaves  this  question  for  further  study, 
and  adopts  a  simple,  relatively  disperse  manner  of  dealing  our  constituent  objects.  We 
adopt  a  simple  uniform  distribution  strategy  hoping  that  the  load  balancing  mechanisms 
incorporated  into  the  system  will  work  effectively.  To  insure  the  efficiency  of  the  calculation 
of  the  function,  we  use  the  simple  distribution  algorithm  shown  in  figure  8.2.  The  node 
numbers  we  describe  are  a  finite  interval  of  numbers  {n  €  N :  0  <  n  <  nodes }  we  might  call 
ordinal  node  numbers  and  not  the  system  network  address  node  numbers  which  encodes  the 
total  addressing  space  of  the  network.  The  conversion  between  the  two  formats  is  simple. 
Figure  8.3  shows  some  sample  distributions  for  various  sized  networks,  birthnodes,  and 
constituent  object  counts. 


4  by  4  Network  3  by  3  Network 
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□  iD  SDD 

□  SO  SDS 
□§□  DBD 


3  CO’s 

Birthnode  =  1 


§□□□ 

□□□□ 


3  CO’s 

Birthnode  =  0 


4  CO’s 

Birthnode  =  3 


©□©□ 

©□□□ 

□□00 

BDBU 

4  CO’s 

Birthnode  =  10 


Legend 

%  Constituent  Object 

O  Constituent  Root 
Object  (Birthnode) 


@□□0 

5  CO’s 

Birthnode  =  13 


Figure  8.3:  Constituent  Object  Distribution  Examples 
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I  =  x  stride  +  birthnode 

r  =  [tBnBBtaP^tl^hag4ttrtri4tj  x  stride  +  birthnode 
if  l  <  birthnode  then  i  =  l- nodes  mod  constituents 
if  r  <  birthnode  then  r  =  r- nodes  mod  constituents 
n  =  min(hop8(currentnode,/),  hops(currentnode,r)) 


Figure  8.4:  Equations  for  Choosing  a  Nearby  Constituent  Object 


8.4.3  Choosing  a  Constituent  Object 

We  now  have  a  first  attempt  mechanism  to  assign  node  numbers  to  each  constituent  object. 
Given  a  constituent  object,  we  can  find  the  node  of  its  residence.  For  simplicity,  we  prevent 
constituent  objects  from  being  migrated.  Now,  we  want  to  provide  an  algorithm  to  choose  a 
constituent  object  given  a  DID.  We  could  do  this  randomly,  but  in  order  to  take  advantage 
of  locality,  we  want  to  choose  a  constituent  object  that  is  reasonably  close  to  the  current 
node.  We  do  this  by  finding  the  ordinal  node  numbers  of  the  constituent  objects  on  either 
side  of  the  current  node  number  (l  and  r  for  left  and  right)  and  choose  the  one  (n)  with  the 
minimum  distance  in  x-y  hops.  We  have  to  be  careful  about  “wraparound” .  The  algorithm 
is  described  in  figure  8.4. 
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Chapter  9 


Issues  From  a  Prototype  System 


Keep  thy  heart  toith  all  diligence; 
for  out  of  it  are  the  issues  of  life 

—  The  Holy  Bible,  Proverbs  4:28 


This  chapter  discusses  in  some  detail,  relevant  issues  that  occurred  in  the  design  and 
implementation  of  a  prototype  operating  system.  The  following  topics  will  be  discussed 

•  The  sizing  of  the  BRAT 

•  How  to  handle  a  full  translation  table 

•  The  scarcity  of  virtual  names 

•  Out  of  memory  problems 

•  Queue  size 

•  Queues,  stacks,  and  saving  processor  state 

These  situations  are  troubling  enough  to  require  discussion.  The  actual  prototype  imple¬ 
mentation  can  be  found  in  an  appendix  at  the  end  of  the  thesis.  Specifications  of  the  system 
calls  and  message  handlers  can  also  be  found  in  the  appendices. 
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9.1  Sizing  the  BRAT 


To  support  the  global  virtual  namespace,  we  use  the  Birth/ Residence  Address  Table  to 
hold  the  necessary  translation  bindings.  This  serves  a  purpose  similar  to  a  page  table  in 
a  multi-level  paged  memory  system,  or  a  segment  table  in  a  segment  addressable  memory 
system.  The  BRAT  needs  to  hold  at  least 

1.  virtual  — »  physical  mappings  for  objects  residing  on  this  node 

2.  virtual  -*  node  number  links  for  objects  that  were  bom  on  this  node,  but  now  reside 
elsewhere 


9.1.1  Memory  Limitation 

But,  due  to  the  small  amount  of  memory  on  each  chip,  we  face  a  severe  restriction  on 
the  number  of  bindings  that  can  be  stored.  Reserving  room  for  system  data  structures, 
operating  system  variables,  and  the  heap,  we  are  left  with  a  paltry  amount  of  memory  for 
the  BRAT.  This  will  directly  limit  the  amount  of  objects  creatable  on  a  node.  We  must 
make  a  careful  compromise  between  heap  size  and  translation  table  entries.  We  must  also  be 
able  to  purge  entries  from  the  table  when  objects  are  deleted,  stressing  an  efficient  storage 
reclamation  strategy. 


9.1.2  BRAT  Use  Scenarios 

Let’s  take  a  look  at  a  few  possible  scenarios  that  can  occur  with  object  management. 

1.  There  is  room  left  in  the  heap  and  the  BRAT  for  more  objects  to  be  allocated. 

2.  There  is  room  left  in  the  BRAT  but  no  more  room  left  in  the  heap. 

3.  The  heap  contains  many  small  objects  that  don’t  take  up  much  room,  but  fill  the 
BRAT,  so  that  no  more  objects  can  be  created. 

4.  The  heap  can  be  nearly  empty,  but  no  more  objects  can  be  allocated  because  the 
BRAT  is  full  of  entries  of  mijpated  objects. 
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The  first  case  is  the  most  desirable  one,  we  wish  we  coold  have  this  happen  all  the  time. 
The  second  case  is  undesirable,  but  will  probably  happen  reasonably  often  due  to  the  small 
memory  space.  This  can  be  rectified  by  exporting  objects  to  other  nodes  to  free  up  heap 
space.  The  third  and  fourth  scenarios,  however,  occur  because  of  lack  of  translation  table 
space  due  to  the  presence  of  large  amounts  of  resident  and/or  migrated  objects.  It  is  these 
two  cases  that  we  would  like  to  minimize. 

The  prototype  system  that  was  developed  assumed  IK  of  RAM  per  node.  Of  this 
memory,  424  words  were  reserved  for  processor  and  OS  data  structures.  Thus  each  processor 
is  left  with  only  600  words  to  be  shared  between  the  heap  and  the  translation  table.  The 
question  that  appears,  is  how  to  partition  the  BRAT  and  the  heap  in  a  reasnable  manner. 

9.1.3  A  Prototype  Sizing  Based  On  Average  Object  Size 

We  have  no  measures  as  to  object  size  in  our  system,  but  we  might  be  able  to  suggest  a 
reasonable  approximation  of,  say,  10  words  per  object1.  With  2  words  of  header  for  each 
object,  this  would  leave  8  words  of  object  space.  So,  each  object  would  take  up  10  words 
of  heap  space  and  2  words  of  BRAT  space,  allowing  ^  =  60  objects.  But,  we  also  need  to 
reserve  room  for  bindings  of  objects  born  on  this  node,  but  now  residing  elsewhere.  Let’s 
assume  that  we  pick  a  limit  for  this,  such  as  the  total  number  of  average-size  objects  that 
could  fit  in  the  heap.  This  would  allow  us  to  migrate  every  object  and  STILL  fill  the  heap 
with  average  sized  objects.  This  leaves  us  with  the  following  equations. 

heapsize  +  bratsize  =  freememory 
residentobjects  = 
migratedobjects  =  residentobjects 
bratsize  =  2  (residentobjects  +  migratedobjects) 


‘Thosgh  of  course  this  will  depend  grestly  on  the  type  of  program  being  run. 
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=>  heapsize  =  ^  x  freememory 
=>  brat  size  =  ^  x  freememory 

With  600  words  of  free  space,  this  leaves  the  following  parameters. 

heapsize  =  428 
bratsize  =  172 

In  a  4K  RAM  node,  we  might  expect  the  following  configuration  as  a.  reasonable  one. 

heapsize  =  2552 
bratsize  =  1020 


In  the  prototype  operating  system,  the  BRAT  size  has  been  set  at  128  words,  rather  that 
172,  for  ease  of  implementation. 


9.2  Running  Out  of  Binding  Space 


Sooner  or  later,  with  even  our  best  efforts  at  insightful  sizing  of  the  BRAT,  we  will  run 
out  of  room  to  make  any  bindings.  There  are  several  conceivable  ways  of  resolving  this 
situation. 

1.  Throw  up  your  hands  and  quit. 

2.  Forward  your  allocation  request  to  another  node. 

3.  Make  the  BRAT  bigger. 

4.  “Delegz*  ”  some  of  the  bindings  in  the  BRAT  to  another  node. 

5.  Change  the  hometown  nodes  of  some  virtual' addresses  to  make  other  nodes  responsible 
for  their  bindings. 

The  current  operating  system  implements  choice  1  for  the  most  part.  There  is  also  some* 
code  to  support  choice  number  2,  but  this  is  complicated  by  the  fact  that  we  might  not  be 
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( 

able  to  allocate  a  context  (at  discussed  in  an  upcoming  section).  If  this  mechanism  could 
be  made  to  work,  it  might  be  acceptable  enough,  realizing  that  any  system  will  break  when 
the  nodes  begin  to  run  out  of  memory.  The  investment  in  a  proper  load-balancing  policy 
may  alleviate  this  problem.  The  operating  system  also  supports  the  resizing  of  the  BRAT, 
but  because  of  the  hashing  mechanism  currently  used  (described  in  an  upcoming  section) 
arbitrary  resizing  of  the  BRAT  is  difficult  to  do. 

The  delegation  of  IDs  is  possible,  but  requires  some  thought.  We  need  a  way  to 
specify  which  IDs  are  delegated  to  which  nodes,  and  this  should  take  significanly  less  storage 
than  would  be  required  to  actually  store  the  bindings.  We  could  delegate  ranges  of  IDs  to 
a  node,  but  this  node  must  have  room  for  the  range,  and  when  this  new  node  runs  out  of 
room,  it  must  also  be  able  to  delegate.  This  is  a  possibility  for  the  future.  The  fifth  item 
in  the  list,  changing  the  birthnodes  of  virtual  addresses  would  be  very  expensive  requiring 
some  synchronization,  and  a  large  broadcast  of  messages.  But,  perhaps  this  could  be  done 
during  the  garbage  collection  phase,  or  offline,  or  at  the  end  of  the  day  as  a  background  job 
(given  a  suitably  large  machine). 

9.3  Scarcity  of  IDs 

As  a  related  issue,  given  the  virtual  ID  format  of  16  bits  of  birthnode  and  16  bits  of  serial 
number,  each  node  can  only  generate  65536  IDs.  In  the  current  system,  it  is  likely  that 
many  applications  would  run  through  this  ID  space  in  a  fantastically  short  amount  of  time. 
Of  course,  the  time  is  dependent  on  the  applications  that  are  run,  but  we  can  sketch  a  rough 
estimate  for  how  long  we  can  run  before  running  out  of  IDs  on  a  node. 

The  following  calculations  assume  a  10MHz  processing  node  where  the  average  in¬ 
struction  length  is  1.5  cycles  long.  We  assume  that  the  queue  is  always  full  of  work  to  be 
done.  We  assume  that  each  message-spawned  task  work  will  be  200  instructions  long  (far 
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above  the  likely  amount).  We  finally  assume  that  only  10%  of  the  tasks  that  come  in  will 
involve  an  allocation  of  an  object. 

107cydes  1  instruction  ^  1  task  ^  ^  allocations  _  ^^allocations 

second  1.5  cycles  200  instructions  task  ~  second 

At  this  rate,  a  node  would  run  oat  of  IDs  in  18  seconds.  Though  these  numbers  are 
questionable  at  best  in  the  absence  of  actual  measurements,  it  is  quite  clear  that  the  ID 
space  is  compeletely  inadequate.  We  have  to  have  a  larger  virtual  ID,  say  by  having  68  bit 
words  rather  than  36  bit  words,  but  in  the  meantime  it  might  suffice  to  (1)  borrow  bits  from 
the  node  number  field  or  (2)  attempting  to  re-use  certain  IDs.  Borrowing  bits  would  be  a 
short  time  solution,  by  limiting  our  prototype  machine  to  a  IK  machine,  we  could  get  a  64 
fold  increase  in  serial  numbers,  allowing  a  node  to  run  for  20  minutes  with  the  assumptions 
made  above.  But,  for  simplicity’s  sake,  the  current  implementation  has  not  adopted  this 
format.  It  would  be  a  good  idea  to  do  this  in  the  future  until  we  build  a  machine  with 
larger  words. 

The  second  idea  is  a  more  interesting  research  issue.  We  already  reuse  context 
IDs  by  requiring  contexts  to  have  received  all  replies  before  they  are  put  on  the  free  list. 
This  way,  the  amount  of  IDs  reserved  for  contexts  (probably  the  most  frequently  allocated 
object)  is  significantly  cut.  There  may  also  be  ways  of  reusing  normal  object  IDs.  but  a 
space  efficient  way  of  noting  these  reused  IDs  may  be  difficult.  Here  are  a  few  possible  ideas 
on  how  to  reuse  IDs. 

1.  Keep  a  fixed  size  table  of  free  IDs.  When  an  obiect  is  freed,  the  ID  will  be  placed  in 
the  table.  When  an  ID  is  needed,  this  free  table  will  first  be  checked.  The  biggest 
problem  with  this  approach,  is  that  when  the  table  fills,  IDs  will  not  be  placed  in  the 
table  and  they  will  oe  “lost"  forever. 

2.  Provide  a  separate  routine  for  allocating  “short-lived"  objects.  These  objects  would 
take  their  IDs  from  a  common,  fixed-size  pool  of  consecutive  IDs  whose  freeness  could 
be  signified  by  a  single  bit  for  each  ID.  For  example,  we  micht  reserve  256  “short¬ 
lived*7  IDs  per  node.  The  short-lived  IDs’  serial  numbers  might  range  from  0  to  255 
and  the  pool  could  be  represented  by  8  32  bit  words  signifying  an  array  of  256  bits, 
where  a  0  indicates  the  ID  is  in  use,  and  a  1  indicating  that  it  is  free.  If  these  objects 
are  truly  short-lived,  and  they  represent  the  bulk  of  ID  requests,  then  this  approach 
might  greatly  extend  the  lifetime  by  conserving  regular  IDs. 
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3.  Every  now  and  then,  perform  an  ID  “garbage  collection  and  compaction”  where  all 
IDs  are  renamed  to  consecutive  IDs  in  effect  compacting  the  ID  space.  This  involves 
similar  issues  to  the  mechanism  of  changing  an  IDrs  hometown  node  number.  It  seems 
to  be  very  expensive,  but  it  may  be  possible  to  interleave  this  with  the  normal  garbage 
collection. 


The  currently  implemented  mechanism  only  reuses  context  IDs  (a  fixed  amount).  No  at¬ 
tempt  is  currently  made  to  reuse  other  object’s  IDs. 


9.4  The  Shortage  of  Memory 

Of  course,  the  scarcity  of  memory  per  node  will  also  prove  to  be  a  problem.  The  goal 
is  to  take  advantage  of  the  large  collective  memory  provided  by  the  system  (a  4096  node 
J-Machine  with  4K  memory  per  node  would  have  16  megabytes  of  primary  memory).  Load 
balancing  can  be  used  not  only  in  choosing  processors  to  perform  work,  but  also  in  choosing 
nodes  to  allocate  memory  from.  Simple  gradient  plane  approaches  [RF87]  can  be  used 
to  cool  down  memory  “hot  spots”.  Garbage  collection,  expanded  memory  nodes,  and  the 
sweeping  of  “dusty”  objects  to  offline  storage  are  all  possible  solutions  to  the  memory 
shortage  problem. 

The  current  prototype  operating  system  kernel  takes  two  approaches  to  memory. 
If  a  message  arrives  to  allocate  an  object,  and  there  is  not  enough  memory  available,  the 
message  is  forwarded  to  another  node.  However,  if  a  process  has  been  running  for  a  while 
and  the  node  runs  out  of  memory,  the  calling  message  cannot  simply  be  forwarded,  since 
some  work  has  already  taken  place.  Instead,  the  process  must  have  its  state  saved  in  a 
context,  and  room  must  be  made  on  this  node  by  evicting  certain  objects.  Unfortunately, 
there  might  not  be  enough  memory  to  allocate  a  context.  A  solution  out  of  this  trap  is  to 
require  that  there  always  be  one  minimal  sized  context  object  available  for  each  priority 
level.  A  check  could  be  made  in  the  CALL  and  SEND  handlers  (and  any  other  message 
handlers  that  could  fall  into  these  circumstances)  for  a  free  context. 
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9.5  Queue  Size 

Queue  sizing  also  proves  to  be  a  problem  in  the  system.  Since  we  want  to  be  able  to  migrate 
objects  by  message  sends,  an  empty  queue  must  always  be  big  enough  to  hold  every  object. 
This  means  that  the  queue  must  be  as  big  as  every  heap.  This  is  far  too  costly  in  terms 
of  memory  in  the  IK  node  prototype,  and  we  have  not  attempted  to  make  a  fix.  It  would 
always  be  possible,  though  admittedly  tedious,  to  send  messages  in  "chunks”  that  would  be 
able  to  fit  in  the  queues. 


9.6  Suspension  and  Processor  State 

Whenever  a  process  suspends  and  plan  on  restarting  later,  it  must  be  able  to  save  its 
processor  state.  This  normally  means  its  register  set,  but  we  must  not  forget  about  two 
other  forms  of  processor  state,  queues  and  stacks.  When  we  suspend  and  there  is  a  message 
we  want  to  save  in  the  queue,  we  copy  it  out  into  a  heap  object  and  set  the  message  pointer 
to  point  to  the  object  instead  of  the  queue.  Stacks  are  more  of  a  difficulty  to  save  and 
restore,  and  we  have  decided  to  explicitly  prohibit  the  saving  of  stack  frames.  So,  the 
operating  system  is  given  the  task  of  insuring  it  will  never  have  to  suspend  and  restart 
with  information  on  the  stacks.  This  was  a  source  of  much  personal  misery  during  the 
implementation  of  the  OS  (though  certainly  less  than  there  would  have  been  without  the 
existance  of  stacks). 


9.7  Summary 

This  chapter  has  touched  on  just  a  few  of  the  difficulties  in  the  design  of  the  Jellybean 
Operating  System  Software.  Some  are  due  to  inadequacies  in  hardware  or  scale,  some  are 
due  to  lack  of  behavioral  measurements,  and  some  due  to  lack  of  insight.  These  will  most 
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likely  become  thoroughly  examined  as  the  machine  design  progresses  into  subsequent  stages. 


L 
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Chapter  10 


Performance  Evaluation 


Never  promise  more  than  you  can  perform. 

—  “Publilius  Syrus”,  Maxim  528 

This  chapter  provides  a  quantitative  performance  evaluation  of  several  important 
system  services.  Though  the  prototype  implementation  is  certainly  not  optimal  in  any  way, 
it  should  be  a  reasonable  approximation  of  an  actual  working  operating  system  kernel,  and 
as  such,  the  numbers  presented  in  the  chapter  should  be  useful  for  the  design  and  tuning 
of  the  rest  of  the  Jellybean  system.  In  addition,  we  should  be  able  to  see  what  parts  of  the 
system  need  fixing,  before  the  machine  is  fabricated. 

10.1  The  Virtual  Binding  Tables 

The  virtual  name  manager  is  composed  of  five  system  routines  nested  in  the  hierarchy 
shown  in  figure  10.1.  The  BRAT  itself  is  composed  of  a  128  word  binding  table  of  64  2- 
word  bindings.  Words  are  entered  by  a  linear  probing  [Sed83]  scheme  where  a  hash  function 
determines  the  first  choice  for  the  location  of  the  binding,  and  a  linear  search  is  performed 
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Figure  10.1:  The  Hierarchy  of  the  Virtual  Name  Manager 


from  there.  This  linear  search  can  take  a  significant  amount  of  time  (at  least  on  the  scale 
of  average  task  size),  so  we  need  (1)  an  efficient  algorithm  and  (2)  a  successful  hashing 
scheme.  The  remainder  of  this  section  examines  the  execution  time  of  each  BRAT  routine 
and  presents  some  very  preliminary  hashing  measurements. 

10.1.1  Instruction  Counts 

The  BRAT_PEEK  system  call  is  the  core  to  all  of  the  virtual  oam*  services.  It  takes  a 
key  to  hash  and  a  data  word  to  match  (not  necessarily  the  same,  since  you  might  want  to 
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look  for  the  lint  NIL  slot  where  n  certain  key  could  be  placed,  as  is  done  when  adding  new 
entries).  The  key  is  hashed,  providing  the  index  into  the  table,  and  a  linear  search  with 
wraparound  proceeds  from  here.  The  cost  of  this  call  is  between  22  and  540  instructions, 
based  on  how  far  the  search  has  to  progress.  A  reasonable  cost  approximation,  Cpeek ,  for 
a  search  that  finds  the  data  in  the  slot  is  22  +  8  x  (n  -  1)  steps. 

The  rest  of  the  BRAT  calls  utilize  this  BRAT-PEEK  routine. 

•  BRAT.XLATE  looks  up  a  binding  in  the  BRAT  and  takes  27  +  Cpeek  8teP8  t0  com¬ 
plete. 

•  BRAT-PURGE  searches  the  BRAT  until  it  finds  the  first  binding  of  the  specified 
word,  and  removes  it  from  the  table.  This  takes  30  +  Cp,,a^  steps  to  complete. 

•  BRAT-ENTER-NEW  adds  a  new  entry  to  the  BRAT  without  first  removing  any 
previous  bindings.  It  accomplishes  its  task  in  32  +  Cp^  steps. 

•  The  most  expensive  routine,  potentially,  is  the  BRAT-ENTER  routine.  This  is 
like  BRAT  .ENTER-NEW,  but  it  first  removes  a  previous  binding,  requiring  another 
BRAT  search.  This  can  take  as  much  as  32  +  2  x  Cpee^  steps. 

10.1.2  Effectiveness  of  Linear  Probing 

Evidently,  the  crucial  factor  in  the  effectiveness  of  the  BRAT  routines  is  the  cost  of  peeking 
through  the  BRAT,  Cp^,  which  is  a  linear  function  of  how  far  away  from  the  expected  hash 
spot  the  value  resides.  What  the  average  distance  in  hash  steps  will  be  for  a  typical  machine, 
depends  greatly  on  (1)  the  application  that  is  being  run,  (2)  how  storage  reclamation  is 
handled,  (3)  and  what  is  done  when  the  BRAT  overflows  —  all  issues  needing  further 
study.  Nonetheless,  I  would  like  to  proceed  with  an  informal,  ad  hoc  analysis,  based  on 
reasonable  estimates  and  educated  guesswork.  The  rationale  is  to  see  if  the  linear  probing 
strategy  seems  to  generally  work  —  by  that,  meaning  that  the  average  number  of  steps  is 
small  until  the  entry  is  found1. 

‘I*  is  sot  obvious  tbst  this  will  so.  In  fact,  it  is  quite  essy  to  be  concerned  that  this  linear  rehashing 
approach  night  actually  srork  itself  into  a  steady  state  where  entries  wore  always  very  far  away  from  where 
they  were  supposed  to  be. 
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The  following  data  was  generated  by  a  simulation  program  called  brataim  that  takes 
an  input  pattern  of  references  and  simulates  their  effect  on  the  BRAT.  The  size  and  max¬ 
imum  fullness  of  the  BRAT  is  specifiable.  The  simulator  takes  each  reference  and  looks  it 
up  in  the  BRAT. 

•  If  the  reference  is  in  the  BRAT,  it  records  the  number  of  steps  away  from  where  it 
should  be. 

•  If  the  reference  is  not  in  the  BRAT,  it  is  entered  as  soon  as  possible  after  its  hashed 
spot. 

•  When  names  get  entered,  some  may  be  arbitrarily  deleted  to  maintain  a  maximum 
full  percentage. 

•  If  the  BRAT  fills,  a  random  slot  will  be  emptied. 

The  reference  pattern  generator  is  also  based  on  initial  approximations,  generating  patterns 
possibly  likely  in  applications  we  envision  running.  It  is  currently  configured  with  the 
following  parameters:  10%  new  IDs,  20%  context  IDs,  35%  recent  IDs  to  simulate  locality, 
20%  less  local  IDs,  and  15%  very  random  IDs  to  simulate  class/selector  bindings,  method 
IDs  and  other  references  following  less  of  a  pattern.  I  would  expect  this  estimate  to  be 
corservative. 

Based  on  these  estimates,  and  the  reclamation  model  presented  above,  we  can  chart 
how  many  steps  away  from  the  hashed  slot  particular  IDs  land  when  they  are  entered.  Fbr  a 
64  word  table,  this  is  graphed  in  figure  10.2.  We  see  an  asymptotic  function  relating  BRAT 
space  used  and  the  locality  of  entries  to  their  intended  slots.  For  the  64  row  example,  the 
system  begins  to  be  unmanageable  after  the  BRAT  becomes  more  than  60  -  70%  full. 

Figure  10.3  shows  the  effect  of  doubling  the  BRAT  size.  The  trend  is  still  rapidly 
increasing,  but  the  gains  we  get  in  terms  of  object  storage  may  outweigh  the  extra  steps 
involved  in  k*,*up.  The  flatness  of  the  middle  portion,  from  40  -  60%  hints  at  a  desirable 
operating  region. 

So,  now  I  would  like  to  suggest  educated  guesses  to  the  answers  to  the  following  two 


questions. 
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1.  How  full  should  we  allow  the  BRAT  to  get? 

2.  How  large  should  the  BRAT  be? 

In  the  last  few  paragraphs,  I  indicated  the  severity  of  the  BRAT  filling  problem.  After  70% 
capacity,  the  BRAT’s  performance  becomes  intolerable.  Fbr  this  reason,  I  suggest  that  70% 
capacity  should  be  an  absolute  maximum  for  BRAT  size,  and  the  normal  operating  size 
should  not  usually  exceed  50%.  I  propose  this  as  the  answer  for  question  1. 

Question  number  2  can  be  answered  by  adapting  the  analysis  presented  in  the  last 
chapter.  The  new  constraint  equations  become. 

heapsize  +  totalbratsize  =  freememory 
residentobjects  = 
migratedobjects  =  residentobjects 
bratspaceused  =  2  (residentobjects  +  migratedobjects) 
bTatspaceused  =  .7  X  totalbratsize 
=>  totalbratsize  =  ^  x  freememory 
=«>  heapsize  =  ^  x  freememory 

With  600  words  of  free  space,  this  reserves  218  words  for  the  BRAT  and  382  words  for  the 
heap.  This  will  hopefully  be  a  more  accurate  value,  though  it  is  not  a  power  of  2,  which 
will  complicate  the  hashing  slightly. 

The  efficient  manipulation  of  the  BRAT  is  crucial  to  the  success  of  the  Jellybean 
system.  Future  study  is  needed  to  evaluate  hashing  functions,  and  perhaps  a  form  of  linear 
re-hashing  is  desired,  where  the  first  hash  is  followed  by  a  subsequent  number  of  other 
hashes  instead  of  a  linear  search.  In  addition,  once  real  applications  are  run,  we  can  get  a 
better  idea  how  the  system  will  behave.  Likewise,  the  translation  buffer  performance  needs 
analysis,  as  this  will  indicate  how  often  BRAT  lookup  occurs. 
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10.2  Object  Allocation 

A  common  task  of  the  Jellybaa  Operating  System  Software  is  to  allocate  objects  from  the 
heap.  This  section  will  examine  how  costly  this  operation  can  be. 

Figure  10.4  describes  the  nesting  of  services  required  to  perform  the  NEW  system 
call.  The  ALLOC  routine  takes  24  instructions,  it  takes  19  instructions  to  generate  a  new 
ID  and  it  takes  32  +  Cp,^  instructions  to  enter  a  new  ID  into  the  BRAT.  With  20  cycles 
for  inter-module  glue,  the  NEW  system  call  takes  95  +  Cp^jj  instructions.  According  to 
the  BRAT  analysis  results,  if  we  operate  at  less  than  70%  full,  we  will  have  to  take  less 
than  10  steps  to  enter  a  new  ID,  this  would  indicate  that  Cpqc^  =  94  steps  and  therefore, 
NEW  should  take  95  +  94  =  189  instructions.  At  best,  with  0  steps  to  search,  the  NEW 
call  would  take  117  steps. 

10.3  Context  Allocation 

Another  commonly  executed  routine  is  the  NEW  .CONTEXT  system  call.  As  described  in 
chapter  5,  this  service  was  expected  to  be  expensive  enough  to  merit  special  treatment.  The 
context  free  list  was  developed  to  provide  a  pool  of  pie-allocated  contexts  for  fast  context 
allocation.  The  flowchart  in  figure  10.5  shows  the  steps  taken  by  routine.  Note  that  if  the 
requested  context  is  of  an  abnormal  size,  or  if  there  are  no  pre-allocated  contexts  on  the 
free  list,  the  NEW  routine  is  called  to  allocate  a  new  object.  Requesting  an  abnormally 
sized  context  takes  25  +  Cnew  instructions,  allocating  a  context  when  node  are  on  the  free 
list  takes  27  +  Cnew  instructions,  but  allocating  a  context  off  the  free  list  takes  only  20.  If 
we  can  keep  contexts  in  the  pool,  we  will  do  well. 

Freeing  contexts  is  also  fast,  taking  only  25  instructions.  This  is  only  about  10% 
of  the  time  it  used  to  take  to  perform  this  operation,  when  we  were  required  to  purge  the 
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Figure  10.5:  Flowchart  for  the  NEW.CONTEXT  Syatem  Call 
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old  context  ID,  generate  a  new  one,  and  place  the  new  ID  in  the  context  and  BRAT.  By 
preventing  late  replies  to  contexts,  we  have  prevented  this  performance  loss. 

10.4  Boot  Code  and  Message  Handlers 

Let's  conclude  the  chapter  with  a  brief  discussion  of  the  complexity  of  the  Bootstrap  code 
and  several  message  handlers.  The  boot  code  is  run  when  each  processor  is  powered  up, 
and  places  the  processor  in  a  runnable  state.  All  together,  it  takes  5005  steps  to  boot  the 
processor.  This  is  made  up  of  4103  steps  to  erase  the  memory,  481  steps  to  initialize  the 
context  free  list  with  3  contexts,  247  steps  to  fill  the  exception  vector  table,  86  steps  to  fill 
the  extended  call  table  and  72  steps  to  set  up  the  stacks,  queues  and  other  values. 

The  WRITE  message  handler  takes  84-7x1  +  3  steps  to  send  1  words  of  data.  The 
READ  message  handler  takes  8  steps  to  read  an  empty  message,  or  7  +  5  x  (1  -  1)  steps  to 
read  a  block  of  data  of  length  1. 

The  CALL  message  handler  can  exhibit  several  possible  times.  If  the  method  being 
CALLed  is  local,  it  only  takes  6  instructions  to  start  it  executing.  If  the  method  is  local, 
but  not  in  the  cache,  it  takes  64  +  ,  ^  stepr,  because  the  XLATE  exception  handler 

takes  58  +  Cpee^  steps  to  complete.  If  the  method  is  not  local,  message  sends  are  involved 
making  it  more  difficult  to  analyze. 

10.5  ROM  Size 

Out  of  the  1024  words  reserved  for  ROM,  the  operating  system  prototype  uses  760. 
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10.6  Summary 


This  section  presented  a  brief  performance  evaluation  of  several  important  parts  of  the 
Jellybean  system.  In  addition  to  analyzing  the  cost  of  routines,  several  more  fundamental 
issues  were  noticed.  These  are  itemized  below. 

•  The  BRAT  needs  to  be  searched  efficiently.  The  linear  probing  method  used  can  take 
a  significantly  long  time  if  values  get  placed  far  from  their  intended  position. 

•  Based  on  preliminary  simulation,  the  performance  becomes  unacceptable  when  the 
BRAT  gets  to  60  to  70  percent  full.  We  can  choose  a  maximum  fullness,  and  derive 
the  BRAT  and  heap  sizes  based  on  the  fullness  value  and  the  expected  size  of  objects. 

•  We  note  that  even  with  an  insightful  configuration  of  the  BRAT,  a  translation  cache 
is  required.  The  configuration  of  the  cache  is  left  to  further  study. 

•  Creating  a  new  object  is  more  expensive  than  we  would  like  (a  minimum  of  117  instruc¬ 
tions).  This  could  be  optimized  with  clever  coding,  but  not  much  more  performance 
could  be  gained  by  this  manner.  The  problem  is  more  fundamental  resting  on  the 
performance  of  the  cache  and  the  BRAT  lookup. 

•  The  caching  of  free  contexts  seems  to  work  well.  Creating  a  new  context  requires 
only  20  instructions  if  there  is  a  context  on  the  free  list  (and  assuming  we  don’t  get 
a  translation  fault).  This  is  compared  to  a  minimum  of  144  instructions  without  a 
context  on  the  free  list.  FYeeing  a  context  is  also  fast,  only  25  instructions. 

•  Calling  a  local  method  takes  only  6  instructions  if  the  method  is  local  and  its  trans¬ 
lation  is  in  the  cache!  If  it  is  not  in  the  cache,  performance  again  suffers,  requiring  a 
minimum  of  86  instructions. 


Table  10.1  summarizes  some  of  the  more  important  performance  statistics  presented  in  this 
chapter. 
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Routine 

Instruction  Count 

Notes 

BRAT .PEEK 

Cpeek  =  22  +  8  x  (n  -  1) 

n  =  slots  to  search 

BRATJCLATE 

27  +  ^peek 

BRAT.PURGE 

30  +  C'peek 

BRAT  .ENTER-NEW 

32  + -peek 

BRAT.ENTER 

32  +  2  xCpeek 

maximum 

ALLOC 

24 

GENID 

19 

NEW 

95  +  Cpeek 

NEW.CONTEXT 

20 

with  context  on  free  list 

no  context  on  free  list 

FREE.CONTEXT 

25 

CALLJMSG 

6 

with  method  ID  in  cache 

64  +  cpeek 

method  ID  not  in  cache 

Tabic  10.1:  Timings  for  Common  System  Services 


Chapter  11 

Conclusions 


All’s  veil  that  ends  well 

—  Shakespeare,  in  All’s  Well  That  Ends  Well  IV 

There  u  a  time  for  many  words, 
and  there  is  also  a  time  for  sleep. 

—  Homer,  in  The  Iliad,  XI 


11.1  Summary 

The  Jellybean  Operating  System  Software  it  a  prototype  operating  system  kernel  for  the 
Jellybean  Machine.  Its  duties  include  object-based  storage  allocation,  virtual  distributed 
naming,  object  migration,  process  definition  and  control,  local  and  remote  process  execu¬ 
tion,  and  the  support  of  an  object-orient  calling  model. 

This  thesis  described  the  JOSS  in  some  detail,  its  successes  and  weaknesses.  The 
report  also  talks  about  issues  in  the  future  Jellybean  operating  system  that  were  not  imple¬ 
mented  in  the  prototype  because  of  lack  of  support,  study  and  time.  These  include  storage 
reclamation,  resource  distribution  bureacracies,  and  distributed  objects.  Theee  will  most 
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likely  become  important  parts  of  the  Jellybean  operating  environment  in  the.  future. 

Several  deficiencies  may  exist  in  the  current  system.  Performance- wise,  searching 
the  translation  table  may  well  be  too  slow.  Several  solutions  can  be  proposed  including  (1), 
increasing  the  size  of  the  BRAT  and  decreasing  the  fullness,  (2)  experimenting  with  various 
hashing  functions  and  (3)  providing  an  effective  translation  buffer.  Memory  shortages  may 
provided  a  significant  problem,  and  this  will  place  an  extra  burden  on  reclamation  attempts, 
which  are  already  made  difficult  because  of  the  problem  of  travelling  references. 

On  the  other  hand,  if  the  cache  works  well,  and  if  the  BRAT  is  not  very  full,  the 
whole  system  seems  to  perform  admirally.  Method  invocations  are  powerful  but  fast.  The 
context  free  list  allows  rapid  creation  and  reuse  of  contexts.  The  global  naming  system  and 
migration  provides  a  high  degree  of  flexibility. 

11.2  Suggestions  for  Further  Study 

This  thesis  scratched  the  surface  of  many  interesting  research  issues,  many  of  which  I  for 
one  would  be  eager  to  investigate. 

In  the  area  of  performance  evaluation,  the  configuration  and  simulation  the  transla¬ 
tion  buffer  and  BRAT  in  a  real  life  environment  is  important  to  the  success  of  the  Jellybean 
Machine.  Also  of  practical  as  well  as  theoretical  interest  would  be  the  study  and  evaluation 
of  distribution  hierarchies  and  the  various  manifestations  of  how  to  handle  virtual  hints. 

Reclamation  is  an  important  potential  area  of  research.  An  efficient  mechanism  to 
collect  garbage  over  a  distributed  network  would  be  of  general  interest  qs  wei^ especially  if 
some  incremental  form  of  collection  can  be  developed.  Policies  for  handling  «%jt  of- memory 
conditions  on  processing  nodes  is  also  attractive,  involving  selective  migration  of  objects. 

Finally,  load  and  resource  balancing  policies  need  to  be  investigated,  especially  since 
each  processor  can  quickly  become  overwhelmed  (being  limited  in  power  and  memory  ca- 
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parity).  Simple  gradient  plane  approaches  might  be  attempted  where  load  spreads  to  where 
it  is  lower.  Network  analysis  will  also  be  an  important  (actor. 

11.3  Hopes 

The  Jellybean  Machine  has  the  potential  of  being  an  important  step  in  the  development  of 
multicomputer  networks.  It  is  my  hope  that  farther  study  will  be  encouraged  so  that  the 
difficulties  of  machines  of  this  genre  can  be  resolved  (memory  shortages,  expensive  name 
translation,  no  caching  of  mutable  objects,  need  for  resource  balancing,  etc.)  and  they  can 
show  their  benefits  as  scalable,  programmable  processors. 
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i  If  sa,  aand  with  SERBS 
i  Sand  itaa  fra*  auaua 

;  Increment  RO 


lanfth  of  ■ssssrs 

Objact  10 

Sound  valua  af  abj  ID 


ua  a  hint? 

forward  nag  to  abjact 
Lan«th  a f  arts 
idraao  to  AS 
Roads r  af  sbjaat 
class  down 
Class 

Bits  of  aalactor  fiald 
Class  fiald  up 
with  aalactor 
Tor  sa  a  clssa/ aalactor 
RS  <-  Mathad  » 

RO  <-  Map  Msadar  w/a  1  snath 
R1  <•  Length  af  CALL  i 
with  aasaaRS  is 
Method* IP  to  Ri 
R1  <-  Modal Mathad* 10) 


R3  <•  Lonpth  of  srps 
If  na  arts.  Just  sand  aath-IO 
*  id  Mathad*  10 
<•  Offset  ta  arps 


i  Incraaant  are  offset 
i  Oacrabant  lanftf. 

;  If  last  art,  sand  a  and 
!  Sand  artuasnt 
I  Leap 

;  Sand  R2  and  and 


S€M0JM_ER0i 


•mod  ••  Maaaaae  Koaiar  u  niiw  mm  fin  •  w»rt  far  •  ««*» 
«iaM/Mt«etar  ealr.  Jh**  nauttaa  aaita  «M  bwuiWaMa*  Mk 
waaM  taa  c)M^aa)«aaw/t»  amt  ay.  m  mu  raaafaa  wm 
of»ar  oatlta*  iMUIMatkaa,  vttftaat  aattfa*  Dr  it  u  iiaTjii 


aw.HVMCO  (OaM)  (aaia 


»)  («Ma>« 


xw.xrrwae^HM. 


U.Mj,ao 

*0,1*0 

ajw.www.ai 


«j»n  m.ti.mmjm 
mm  «.ai 
mm  i.ac 
MM  ci.aaj.ao 

NCw_MaTHODjM_Loaa>  _  _____ 

mm  tai®5,NBt'^-,,WU’ 
MM  aj.caa.M] 
am  ao.t.at 
mb  ai.i.ai 
mo  at.i.at 

«a  'anuc^*0-WLUor 

aW-MPNOOMMLwKKLr 

OC^  MO  MC*U_»M«0r*_l«LKT0 )  1 4 

oc  _  MMotaa.iatrau^jMMB 


COBJBCT_XO.tt) 


M  <-  BIBO  af  aa«« 

aao  ta  l  aaoBor  «• 
a»  <-  "MaMao-  at* 

sraanw 

ai  <-  Korea  ooaoai 
at  <-  Boat  iWM 
a*  <-  dm  a#  aaae 

If  no  Dm  MM  Mm 
a»  <•  Oataaara 


at  <-  fkta  naOa 
ao  <-  aaaoor 


ao  <-  to  of 


i  »bmm 


NnjvnoojM_«o> 


NCV.MK  --  NUUK 


K  routine  to  create  a  now  inatanca  of  a  certain  ciaaa  and 
the  10. 


i  Ntt  (ai***of**bjact)  (ciaaa)  (reply- to)  (rooly-aalaatar)  ( optional-** ta)» 


«HJ«Qs 


NCVJNMl! 

U 


jmjaatxni 


(1.AJ1.M 

tl.Ml.tl 

TttPHN 

*o,*I,*u»Tt_oaj 

;  •••  Copy  Optional  Oata 

SYS  LDtHMN 

to.Xil.il 

tt.rai  inr.ti 

n.ti.to 

to.a.to 


M.*JPtV_WaWIT 

*0.1.  M 

al.tll.tl 
.CU.M] 
ai.i.ai 

t*.i.t* 

i 

Snt”Jy51»_xd_»it* 

ai.w.ao 


I  *0  <•  lonptb  of  object 
i  R1  <-  ciaaa 
I  hum  a  now  object 
i  U  <•  approoi  of  object 


I  W  <-  low  it  bit  peak 
t  Hi  <•  WaniQi  booPar 
i  Caat  Into  On  DfT 
i  to  <-  lmptb  of  aooaato 
I  lonoro  flrat  •  orfuaonta. 
a  loovinp  optional  Oata 
lonptb  in  M 

i  Ri  <-  offaot  into  puom 
s  M  <•  offaot  into  abject 

;  If  no  Pit*  loft.  okU 


I  U  <•  Oata  fro*  aop. 
i  Star*  Oata  in  objoet 
I  In  or  font  offaot* 


HM>(tcia>j«o«tn_UN_irrf)i4 

to 


I  tl  <•  reply  10 
l  at  <-  f  of  bite  of  » 
t  an irt  noOb  »  oawn  a  put  in  at 
I  tanp  Pact  motif  noPa 
i  aa  <-  t«D  noaooge  nooaar 
t  Nall  tut  tao  booPar 
;  Son*  tbo  tarpat  IP 
t  Imp  tbo  aalaetar 
I  ton*  now  obj  10  aa  final  arp 


iyw*K«p«  ■ 


i  mrWOjmCHBTJtnyjm  ••  IUn  the  Mthed  in  an  Mjact  and  raatart  tna 
Nil  Mat. 

!  «naoj«uaT_iicn.r  (mom-io)  («iim«-nu)> 

;  Kuna  under:  AO  abaeluta  aoda,  Un enact ad 


«TW»_IMUOT.«f«.YJ«8i 


SUB 

NOM 

CALL 

XLATt 

OC 

OB 


111. 

«.**].« 

AO 

bo.i.bo 

CLASS JtfTMIO.BI 

rwjn 


H.MJ'tLL.OBJi 

BI 


BO.fOSJ«CTJ«.AiJ.I 
AO,*OSJtCT_HOB,At  J 

AO 

M.A.M 

t.Bt 

*.«1 


AOO 

AOO 


H_A_A_cancot 


CALL 

OC 

MM 

OC 


Bi.mjTbJ 

.fAl.AIJ 

A1.Y.A1 

At.i.a 
*0.1. AO 

*NA.B_ftm»j 


C».M3.«0 

A2.B1 

A0.A1 

*ALLJWAT_tMT«BJ«.BS 

TMAJCAU 

VABJCACMOM 
C  AO, AO], At 
VMUCACMJLBWni 
"AO.AOJ.AJ 
1.AI3.A1 
.AJ.At 


no 

E 


to  keen  lanfth 
Lanfth  af  1 


•  10 


1 
1 
1 

:  toner*  aosaoa*  hi 
1  *1  <•  Claoo  af  * 

1  Net*  a  HAM  Wjaet 
1  At  <•  Ad*-aea  af  Njact 
1  BO  <•  Co*y  Bit 
i  BO  <-  HBr  aarfcad  a*  a  c any 
1  Mrk  ahjact  aa  a  copy 


BO  (lanfth  af  aaf) 

1  BO  <•  Lan  af  aathad  «/•  hare 
1  Bt  <•  Oaaraa  index 
1  B1  <•  oaatinatian  mean 

i  If  n*  Mr*  lanfth,  amt  la** 
1  BS  <•  Hard  fran  maa*f* 

<  hit  oar*  in  nathad  ahjact 


1  Ineraaant  daatinattan  in 
1  Deeraeent  1 anftn  loft 


i  AO  <-  Orifinal  earned- ID 
1  hi  <-  Nttned  cany  adaraaa 
i  Inter  in  XLATt  each* 

;  AS  <•  MAT  interne*  tcall  # 
:  inter  in  MAT 


i  At  <-  Off  eat  t*  nathad 

1  BS  <-  hard  ala*  af  eaat 
1  Bi  <•  Nathad  10  fran  1 
1  Bt  <•  Offaat  peat  1 


1  Search  the  Nathad  Cache  aireetery. 

N_B.B_StANOije.IOi 

mm  Bt.t.Bt 

SUB  BS.t.BS 

N  B1,(H,A0],flB 

ST  AO.  " 

MI  BS.' 

SB  '•OUUW.XBJ 

h_B_B_fOU«OJC_IO  1 

IM  NIL,  AO 

MM  BO.Cat.AO] 

AOO  Bt.I.Bt 

MM  (Bt.A01.BS 

MM  AO, (At, AO] 

H_B_B_BtSTAhT_CTXT _f»CN JCAO* : 

MIL  BJ.'NB.B.IXIT 

bom  mat. as 

StMD  BI 

OC  NMl  (BBSTXNT.CONTIXTJMMSVS. 


I 

1  Oaeranant  1* 

1  X*  thla  the  id  «*  uantT 
1  If  a*.  Branch  t 
1  If  lanfth  la  0,  lea* 

1  If  net  in  NC,  oh  act  efleu  Hat 

I  Bt  <•  NIL 

I  tat  10  T*  NIL 
;  faint  offaat  t*  aalt  Hat 
1  BI  <•  (car  «*it-Hat) 
i  Sat  celt  Hat  te  MIL 


1  If  cantMt  to  la  nil,  amt 
i  Bt  <-  thia  mm 

U  this 


1  Pane  a  aaaaaf*  t 

i_i«_sns )  it  1  tvs.iac 


NJULMHi 


XLATt  BS.M.XUTBaBJ 

NOM  [C0MJ«xO>nfXT,M],BS 

OB  SuOCjWrfABT.CTXTJhlaNJCnCie 


1  land  ID  te  raotart 
;  Bat  aaarm  af  eantaxt 
I  BS  <■  nant  ctxt  Q)  In  Hat 


If  net  in  1 
the  pramana  aantant  ID,  and  BS 
fdintdfd  to  do  I In*  it 


N_A_A_MOT  IMJCACNii 
NWt  NIL.Bt 

k  MucACMtL«®ru*_urr 

NOME  (M.ABl.BS 

N.B  B-LOCf  THBU  OBtBflM  LIST: 

mTl  m.-hjOLimt 

XIATt  BS,At.XW«5ij 

to  BI  .COONT  aciounct.Af  J.M 

BT  AO,'N_B_I_UNLHa(_C 


Hn  Hat.  Um  Bt  t*  held 
MtdM  to.  UM  IMN 
an  Hat. 


;  Hi  previa**  10 
;  Bt  <•  Ad Or  *f  aflan  Hat 
;  BS  <-  Car  af  avarflav  Hat 


Hat  BU,,  amt 
;  At  <-  Can tarn  Aaar 
;  aaitinf  far  tm*  Mined? 

;  If  m,  cut  cut  cut  af  Hat 


ttSTMajXMnXTjm  —  T  murtr  can  trot  to  •  contact 
mstmtjooktot  (contact- to) 

Run*  unMr:  AS  AbooluM  M) 


RCSTAHTjOOHmTJMi 

—  y.«3 

CAU. 

AtSTA«T.C0limT_t*8J 


!  SO  <-  Contact  10 
i  Trawfor  to  contact 


NMMTE_ObJECT_i*S  -  Neve  an  abject  te  e  n»*  nede 
MtORATt.OOdCCT  (object-id)  {nede-avOgrJ 
Hum  under:  AO  **»*"'l«  —O 


Hw»Ttjs«jict_Nagi 

MM 

£2  ffi«n.£Mcr.u 

CAU.  TRAO.XCALL 
SUM EM 

MIGMTE.gbJteT_NRb.EOO: 


i  RO  <-  Object  10 
;  *1  <-  Dm!  nod*  nua 

:  Mgr  at*  the  object 


HdmMTE.OdJECT.MM  --  Lit  UMe.  object  reside  on  this  nod* 
DMGMTE.OdJECT  (object-id)  (pravlous-resldeaea)  (object -data)* 
Nuns  under)  AO  Absolute  aede.  unahocbad 


DMGMT«_ObJECT  MM: 

susm  I 

MOVI  TM, 


TRUE, OS 
0.1 

C0.A3J.M 

W.SVS.LSUWaK.M 

M 

00,3,0 

C3.AIJ.Kt 

01,-SYG  UNJITb.AI 

ti  ,sr»Ju&£j*m.m 

TW  NAUjOC 

CJ.aJj.o 

STS  UWRMdtt  JIAO 

os.ie.AS 

o.co.mj 

Ct.AJJ.KO 

AI.R1 

M.Rt 

XCALL_SMT_OfTOU«U.  U 
THAT  XCAU. 

m.ct.Mj 

M 

0.1.  K1 

o.a.o 


(Sum.  Ii.a.rs 

IT  R2.~DMbMTE_abdbCTjSXlT 
MOM  CKt.A3J.Kt 
MM  K2.CO.AtJ 
SUO  0.1,0 

no  01,1,01 

m  ~De«dKATE_aOJ«CT_UJ» 
DM6MTE  DbJECT  EXIT : 
fBf  I 

OC  MM.-STA  tXCI<OW_X«SI0XMtJ»TJ 
SfOOt  (2.A3J.O 
MOM  000,0 
scoot*  C1.A3J.O 


Save  internet  statue 
U  <-  True 
Disable  internets 

O  <•  Neesate  header 
O  <-  Mseaaaa  lenfth 


i  O  <-  Object  length 

:  01  <*  Object  header 

:  Shift  claea  deaei 
i  01  <•  Class  of  abject 
i  Hal  locate  as  sane  aapery 
s  Kt  <-  Object  header 

;  O  <•  Unseeable  bit  _ 

;  Set  header  of  nasabjeet 
;  O  <«  Object  ID 
i  Ri  <•  Address  ef  blech 
i  Enter  ID/ ACER  in  XLATt  table 
;  AS  <-  HAT  EnterNaw  Seen  # 

;  Enter  In  HAT 
i  flit  2nd  alet  with  ID 
l  O  <•  Message  laaftb 
;  01  <-  Offset  tc  last  aa«  ward- 
;  O  <-  Offset  to  end  Of  dest 

l  At  first  data  were? 
t  If  aa,  daaa 
:  O  <•  bets  ward 
;  Put  date  ward  in  object 
i  Deemaant  O 
!  Dan raaan t  oi 


;  hag  tnt.  disable  flag 
«3T»_LEKLW7»)I3 

;  Saad  previous  node  f,  header 
;  O  <*  This  node  nuaber 
i  Sand  abj  ID  end  this  nede  A 


DMOMTE.abJECTjaG.ENOi 


i  MOW.auioiNB.ATjat  —  Notify  eld  real 

■  NOV.RtSIDING.AT  (object- id)  (residence 

■  Runt  under)  AO  Abeelute  aede,  unchecha 


ef  new  reel  dance  A  tell  blrthnads 


OON.KESIOIOBJtT.iab) 

MM  0,0 


I  ODD  to  prevent  EARLY  fault 
Cl.ASl.O  i  O  <-  Object  tO 

CE.A3J.R1  ;  hi  <-  See i dance  Neds  * 

O.Ri  i  Cache  O  •>  Ki 

*CALL_|MTJMTEO.M  j  03  <-  OAT  KOTO  Xcall  # 

THAT  XCAU.  i  Mnd  In  MAT 

Cl.Alj.R1  ;  RI  <-  Object  » 

01.  -STS  IO.IO.SITS.OJ  i  Shin  tlrthnede  nuaber  dean 

ri.tarTot.ri  t  bat  teg  tc  MT 

MM  i  Svf  _ JOC I  ( UPQATE_bIOTMOOOE_ORD<<JYl_LoO.KTS )  1 4 

01,0  *,  Sand  header  to  birtnnotfb 

(t.AJJ  ;  Send  abject  10 

CE.OJ 

NOR, AO 


O  <•  This  node  * 


;  Sand  f  u  previous  rasldanc* 


Nou.us  :ozw.«t ja*_noi 


unoATt_iiaTHM«JM  —  Natlfy  tha  birthnod*  of  tM  naw  rnaidanea,  md 
Mr*  tha  NJNI  aovMli 


UfOATl_iIATNMOOt  (MjdCt-td)  (raaldan«a-n*da)  (praul ous-wada) 


;  tuna  undart 

AO  Absolut*  aada.  uncnackad 

UMMTI_HaT)«00(JM> 

MM 

NNA.U 

t  M  <•  This  nada  # 

MM 

Ci.Mj.ao 

i  00  <•  Ohjaat  to 

KMC 

Ca.Ml.ai 

i  01  <-  Aaaldanea  Aada  * 

MM 

Cs.mJ.as 

;  as  <-  nraviaua  nada  0 

MUM. 

as.at.H 

t  uaa  fuy  nravlaualy  naraf 

tr 

a* .  ”unoATi_*im#«oa_nouAau 

If  aa.  dan**  rablnd  ataln 

ona 

ao.ai 

i  cacna  at  •>  ai 

MM 

CALL 

*CAlX_t*AT_Dr!T* ,  IU 

raan  ncALL 

i  as  <-  MATjprraa  xeaii  # 
i  Bind  In  MOT 

I’OATtJSiaTMOO*  MOUAtUt 

OC 

NMisirsjMei  (oaucCTjtovAau  j«o«m_unjnn>  it 

toot 

ai.ao 

I  Sand  htiddr  id  mldiHM 

SOM 

Cl. Ml 

s  Send  odjdtt  SD 

uf»n_*nm*a*_H»^.wo  t 


OtJ«CTJdWMlIJM  ••  Mark  tha  a*Jact  aauabla 

OUCrjWMa  <o*jdct-id) 

tuna  undart  M  Aaaaluta  aada,  unchaa*ad 


outer jwmxjmi 
—  ao.M 


XlATl 

MM 

OC 


Ci.Ml.i 

Aa.Al.XLATtJJBJ 

-awjat  WMMUJMK 

(M.aS.ai 

ai.ca.M] 


i  non  u  arauam  imlt  fault 
t  aa  <•  flkjast  ID 
s  M  <-  Cfejam 
I  ai  <-  atiaet  I 
t  at  <-  Allkat  unaavatla  *U 
t  ai  <-  Mwatia  an  tact  naadar 
Mk  in  afejaat 


OaJCCTJMMDUJMJDBi 


i  Ml* 


STJTtN  CALL  TUM 

roiMumwwtmmmuMMWMwwMWMMwwmwwmHmHmmw 


XCALL.TW  ••Call  an  awtandad  ayataa  sal) 

Kuna  undart  AO  attaaluta  aad»,  anatmaid 
Input* i  A3 

Traaaaai  A3 


KCALL.TAA1 

PUSH  AO 

DC  00  XVCCTOAt.BAat 

too  AO. A3. A3 

MOW  CA3.A03.A3 

non  ao 

NOW  A3.  IK 

XCALL  TAN  NO: 


Sava  AS 

AS  <-  Sana  of  xv— tar* 

A3  <•  Xvaewra  ♦  null  0 
A3  <•  Mail  rauttna  9 
Aaatara  AS 
Oa  w  XCALL  rauttna 


SWN_taa  —  Swaae  all  nan-aartiad  eajaata  in  tna  Map  Ooan 
tauaro*  tna  Maa. 

Auna  undar:  AO  ahadov 


nw  n 

[ri.mi.ro 

M,R1 
tt.HO 

I  W.R1 

xeoujMTjDfm.u 


•t 

•i 

ai.Mj.no 
.SWjU0U1MK.ro 
tt.M.b 
ri.ro, si 
*.W.ITIMTE 
•COW: 

M.1.R0 

R1.1.R1 


OR 


Mff.WjMi 


tt.l.tt 

[R0.M1.R1 

U.CR1.M] 

A.st«R_conv_LOon 


I  Java  Rt 

i  01  <•  Ml  *  1 

i  tt  <-  to 

i  01  <-  MQRi <00  JOostXabJ t«n> 
t  M  <•  tO 

t  Rut  I0/M0R  pair  In  cacba 
i  U  <•  (RAT  tutor  Xcall  # 
i  tutor  in  OORT 
I  Roatoro  tt 
i  Roatara  R1 

i  R0  <-  Raadar  of  objact 
i  R0  <•  Lonpui  of  objoct 
;  tt  <•  nom  arc 
I  R1  <-  Rant  Coot 


i  Copy  a  bit  o’  abjoot 


m»i 


;  The  address  of  the  block  Is  —turned  In  *1  •  At.  The  see— nytn* 

10  —platers  (101  A  IOC)  ore  filled  with  too  cwloit  10.  The 
HCAOKA  4  CONTKXT-ID  fields  oro  filled  in  by  this  —ins.  Ik* 
next -CONTEXT  a lot  «a  filled  with  NIL.  It  la  up  to  a— Heat  1—  coda 
to  fill  in  tho  100-3,  00-3,  and  Id  a lota  ainea  thoao  vale—  — p  bo 
corruptod  while  in  the  ayet—  IMP  coda.  Tho  POTffl-dPPWT  field  la 
filled  in  with  the  offset  free  the  header  of  the  santnwv  This  field 
can  be  uaed  to  pass  the  bulldlnp  of  a  palmar  to  the  potato  part  lea 
of  content. 

i  If  the  apace  naadad  la  <•  the  nor— 1  context  alae  (defined 
i  by  GONT_NOAMLJIIK),  then  a  feat  content  la  alia— ted  off  of  the 

free  Hat  If  poaalbta. 

i  buna  under i  M  absolute  nods,  unchecked 
:  Inputs i  RO 

;  Outputs:  A1.I01.tt.IM 

s  Trsahea:  AO 


NAV_CONTEXT_TW>: 

POOH  A1 

PUSH  A2 

PUSH  AO 

DC  VAA  CFAEE_L  1ST 

move  AO. fa 

POP  AO 

6T  A0.00NT_N0ANAL3IK.A1 

■T  A1.'NtN_C0NTfXT_TAP_ALU>C 

MOVC  CU.MJ.A1 

MIL  A1  ,'<NEV_CONTEXT  TAP  ALLOC 

XLATE  A1.A1.XLATK.0AJ 

XLATf  A1.tt.XLATK.0AJ 

MOVK  COONT.NKXT_80HTKXT.A1],  At 

MOVK  A0.CAf.A0] 

MONK  NIL,  AO 

MOVK  A0.CC0NT_NKXT_C0NTKXT.A1) 

POP  AK 


POP  A1 

POP  IP 

NCV_CONTlXT_TAP_  ALLOC : 

A00  AO, 9, AO 

PUM  M 

AOO  M.CONT  PKTATK  3IK.A0 

MOVK  CLAM  dMTKXT.il 
CALL  TAAPBeN 
XLATK  A0.tt.XLA1K_0AJ 
XLATf  A0.A1.XLATt_e«J 
POP  AO  - 


POP  AK 

POP  A1 


MOVK  AO. COONT  PKTATK jOPPKCT.AI] 

MOVE  NIL. AO 

M0M  A0.CC0NT_AKXT_00NTKXT.tt] 


Ss—  A1 
SO—  AK 
Save  AO 

RO  <-  Aeso  of  Cfree  Hat 
Snap  to  A2 

Aaatara  AO  with  —or  al¬ 
ia  am  >  ner—1  aim? 

If  — .  allacata  a  new  a— lint 
Al  <•  let  etnt  la  free  Hat 
If  —  nara  ner—1,  than  aiiee 
Al  <•  Content  AdOr 
M  <-  Content  Addr 
AO  <-  Atm  Context 
Paint  efr—  Hat  to  next  etnt 
00  <•  AIL 

Ere—  i— it  ctxt  ptr  (far  — ) 

Aaatara  At 
A— tare  K1 


AO  <-  Offset  t«  pats to 

Seve  —to  to  offset 

At  <-  Total  c— tent  sej  al— 

Al  <-  "eon text’  ala—  vet— 

N—e  a  n—  oajoct 

At  <-  AOOre—  of  — tact 

C a—  to  Al 

A— to—  — tats  offset 
A— tare  At 

PUI*PKTATE-OPFitT  tint  flalt 


At  <-  AIL 
Aa  next  cement 


NEV.TRP  —  Trip  te  generate  a  nut  object 

Tikti  the  KM  at  the  object  in  M  an*  tn*  elm  in  m  and  allocataa  a  block 
of  Maory  tor  tn*  object  and  assigns  it  a  unique  10.  The  10  la 
returned  in  KO.  The  holder  la  tagged  aa  an  object  noader,  and  the 
claaa/lanqtn  field  la  filled  In.  Tn*  10  alot  la  filled  with  tn* 
no wly  generated  10  for  tnia  object.  In  addition,  tn*  XIATE  cacti* 
t  BRAT  are  updated. 

kuna  under:  AO  Absolut*  mode,  Unchecked 
Input*:  R0,R1 

Output*:  RO 

Train**:  R1 


NEVJTRP: 


PUSH 

I 

PUSH 

At 

PUSH 

RS 

MOVC 

TRUE,  AS 

MOVE 

AS,  I 

CALL 

trap.malloc 

ISM 

A1.SVS  LEN  BXTS.R1 

OR 

RI.M.RI 

VTAG 

Al.TAGJJbJMEAO.RI 

MOVC 

R1  ,C0, At] 

CALL 

TRAP  GENID 

MOVE 

A2.R1 

ENTER 

*0,11 

MOVE 

XCALL  BRAT  ENTER  NEV ,  RS 

CALL 

TRAP  XCALL 

MOVC 

AO.fl.AtJ 

POP 

RS 

POP 

Al 

POP 

I 

POP 

NEW  TRP.ENO: 

IP 

;  Puan  int.  diaable  flag 
:  Save  At 
;  Save  RS 
i  At  <-  True 
;  Olaabl*  interrupt* 
i  Hal  locate  a*  aeaa  denary 
;  Shift  claaa  peat  ten  bit* 

i  Margo  elaaa  A  lanptn 

:  Tag  claaa/longui  aa  objneedar 
i  Fill  lit  alot  with  clasa/lan 
;  Generate  an  id  into  RO 
:  R1  <-  Addreaa  of  block 
:  Enter  10/A00R  in  XLATt  table 
i  R]  <-  BRAT  EnterNaw  Xcall  # 
i  Enter  In  BRAT 
;  Fill  End  alot  with  10 
:  Reater*  RS 

:  Riitort  Al 

;  Pop  Int.  diaable  flag 
;  Return 


;  FREE.CONTEXT  ••  free  up  the  context  in  101 

i  If  Um  alia  of  the  context  equal i  the  norxel  foot  context  tin.  then 
no  piece  tne  context  beck  onto  Um  free  Hot  oftor  il  locating  e 
;  now  10  for  It  (In  COM  of  Into  arriving  context  repllea).  Otnenriae. 

Um  context  la  aerlied  far  delation. 

i  Runt  undort  AO  APaolute  Hada 

;  Input:  101 

;  Troahaai 


FREE_CONTEXT_TRP: 

Rush  m 

push  *i 

HOVE  101.  no 

CALL  TRAA_FO£E_SPtCIFIED_COKTEXT 

pop  ri 

POP  RO 

PAE£_CONTEXT_TRP_£NO: 


FREE_SPECItt£0_C0NTEXT  —  Prop  up  Um  context  apeclfled  in  RO 

If  the  olio  of  the  context  equal!  the  normal  faat  context  alia,  than 
wt  place  the  context  bock  onto  the  free  Hat  after  allocating  a 
new  10  for  it  (In  caaa  of  late  arriving  context  repllea).  Othorwlae, 
the  context  la  narked  far  deletion. 

Rune  under:  AO  Abaolute  Node 
Input:  RO 

Troahaai  R0.R1 


fREE.SPECIPIED  OONTEXT_TRP: 

PUSH  A2 

XLATE  R0.AZ.XLATE_08J 

MOW  C00JECT_H0R.AI3.R1 

AMO  R1.1YS_LENJMW.R1  I 

MR  R1.4.R1 

R1.C0Nr_PSTATE_Sia.R1 
R1.CONTj5i3bML_im.R1  ; 

ST  RI .  -PRn.COirrTXT  TRP_XS*P_HD1  i 

HOVE  C0SJECT_H0R.AI3.il  I 

OR  A1.iYSJMRRJMW.R1 

HOVE  R1.C0SJECT  H9R.AS3  ; 

SR  "PREE.CONTfXT  TRP  EXIT 

FAEE_CONTEXT_TRP_KEEP_HW: 
i 

•*a  No  longer  need  to  generate  now  10  too 


Save  AS 

«  <-  Add r  of  context 
RI  <-  Header  of  context 
Ri  <-  Length  ef  context 
Subtract  A  flret  werda 
RI  <-  uaer  apace  alio 
la  uaer  apace  •  noma!  alxet 
If  ae,  add  hie  to  the  Hat 
RI  <-  Header  ef  context 
Set  deletion  bit 
Have  hdr  back  to  object 


PURGE  RO 

PUSH  I 

PUSH  R3 

MOVE  TNUE.R3 

MOVE  R3.I 

MOVE  XCALL_IRAT_PURQE , R3 

CALL  TRAP_XCALL 

CAU  TRAP.GENIO 

MOVE  R0.C0SJECT_ID.A2  3 

MOVE  A2.R1 

ENTER  R0.R1 

MOVE  AZ.R1 

MOVE  XCALL.SRAT_ENTER.R9 

CAU  TRARJCAU 

POP  R9 

POP  I 

OC  VAR_CPREE_LIIT 


I  k amove  10  RO  from  cache 

;  Save  R9 
;  R9  <•  True 
;  Olaeble  Interrupts 
;  RS  <-  Purse  Xcall  * 

:  Ream  ID  fro*  SRAT 
;  Make  a  new  to 
;  Patch  new  10  into  context 
i  RI  <•  Context  POOR 
;  Hake  new  cache  binding 
:  Ri  <•  Context  Addreaa 
i  R9  <•  Enter  xcall  * 

(  Enter  binding  in  SRAT 
i  bee  tern  RJ 
1  Rea tern  Interrupta 
i  RS  <-  Offset  to  CPREE  Hat 


CR0.AS3.R1  _ 

R1.CO0NT  NEXT  CONTEXT,  Al  3 
C0SJECT.l0.Al3.R1 

Ri,CR0.Sb3 


'  Rl.CflO 

PREE_CONTEXT_TRP  IXXT  i 

pop  6 

pop  IP 

PREE_1PECIPIEO_CONTEXT_TRP_ENO: 


i  111  <•  CFUn  bm 
l  Put  CPREE  Hat  as  next  ctxt 
i  Ri  <•  Object  10 
t  CPREE  Hat  <•  Centex*.  10 

i  Restore  AZ 

l  Return 


Ml 

IP 


EXTENDED  CALL  IOUTINCS 


WAT.ENTU.XTW  —  Add  M  10/MM  pair  te  tne  MAT 


Runt  Under: 
Input*: 


M  Absolute  Med*.  Unchecked  Nod# 
A0.A1 


Take*  and  IO/AOM  pair  in  M  t  II  and  enter*  tn*  pair  Into  th*  MAT. 


.ENTIA^XTAP 

AS 

PUSH 

*3 

PUSH 

AS 

PUSH 

*1 

PUSH 

AO 

MM 

AO, AS 

J  AS  <-  10 

MM 

A1.AS 

1  A3  <-  ASM 

ae 

V*S_MAT  SAS* 

1  AO  <•  Off*#*  t*  SAAT  variable 

MM 

CAO.AOJ.il 

t  HI  <•  SMT_MSC 

K 

STSLSH_SITS 

LSH 

A1.fc.A1 

t  SAIft  MAT_SASC  t#  addr  field 

oc 

OR 

VAA_SAAT  LSH8TH 

ai.cm.a9j.ai 

1  A1  <-  SAAT  twee  |  lanfth 

wrafi 

A1.TAAJISM.A1 

i  Caet  A1  into  an  ASM 

MM 

A1.AS 

1  Have  SAAT  ptr  IMP  At 

MM 

AS. AO 

i  M  <-  n  that  ma  paeeed  in 

MM 

A0.A1 

i  A1  <-  ID  that  uaa  peaces  in 

CALL 

TAAP.SAAT.POK 

:  find  effect  A  retern  in  M 

SMIL 

AO.-JMATJEMTiAJ* 

t  If  effect  I*  nil.  ua  tat  ID 

MM 

A1.A0 

:  AO  <-  »  (atlM  in  R1) 

MM 

NIL.A1 

1  R1  <-  AIL 

CALL 

TAAP_SAAT_POK 

:  Pins  offset  A  return  in  M 

SMIL 

A0,~_SAAT_EXTI*_0K 

;  If  offset  nan  nil,  atm  react 

CALL 

TAAP.OH 

t  If  no  reae.  die  far  nee. 

TJMTByK: 

AS.CA0.ASJ 

:  Put  10  in  let  alet 

ADO 

A0.1.A0 

MM 

AS.CAO.ASJ 

;  Kit  AOM  in  2nd  slot 

POP 

AO 

POP 

R1 

POP 

AS 

POP 

A3 

POP 

AS 

POP 

IP 

_I*TI*_XTPP_D«I 

■UTjOATtjrnv  -  m«u  an  JO  fraa  tna  MAT  inu  an  AOOA 

Auns  llndaft  AO  Shadow,  Unchackad  Mada 
Inputs i  AO 
Outputs  AO 

Takaa  tn«  to  to  lookup  in  tna  AMT  in  AO.  «han  tAs  rnrraionniUnp 
AOOA  valua  la  found.  It  la  raturnad  In  AO. 


8AAT_XLAT8_XTAPi 
PUSH  A3 
PUSH  A3 
PUOM  A1 


MOVC 

oc 

MM 

OC 

ISM 

OC 

OA 

vroa 


MM 

CAU. 

■MIL 

AOO 


AO, A3 

VAA  WAT  MM 

CA0.A0J.Il 
SYS  LM.OITS 
kl.AO.AI 
VAAJAAT  LEN6TH 

ai.Tao.a5j.ai 

A1.TAC_AOOA.A1 

A1.A3 

A3. AO 
A3.A1 

TAAP_OAAT_Pt« 

AO  ,MJIAAT_XLATI_AnUM 


A0.1.A0 
CA0.A3J.A0 
_*PAT_*LAH_arnjAM  s 
POP  A1 

POP  A3 

POP  A3 

POP  IP 

•AAT_XLATt_XTAP_OA)i 


I  A3  <-  10 

s  AO  <•  Off  sat  ta  MAT  variaOta 
I  A1  <-  0AAT_AAM 

!  Shift  WAT_AAM  ta  addr  fiald 

i  A3  <•  MAT  baaa  |  lanptn 
i  Cast  A3  Into  an  AOOA 
s  Nava  OAAT  ptr  inta  A3 

I  Find  offaat  A  ratum  in  AO 
s  If  AO  nil  ratum  tna  nil 

;  Pick  out  AOOA  0  ratum  In  AO 


‘  rMH-LM-dlt **P  LifOM*-  x-t-t 


—  *tr«*  m  id/mop  pp«f  rum  uw  omt 

•m  UMort  M  Mw,  uncMMMlMi 

l>WU;  M 


•tttjo  W  wn»  In  M.  Ttw  rwltM  writes  MCI  tut*  Set* 
tMe  ID  ft  MM  (lot  of  Uw  MNMf  In  Uw  topi*. 


’.PMMXTMl 

Mn  u 

PPM  M 

PUM  PI 

MM  P» 

MOM 

PO.Rt 

DC 

0C 

IM 

K 

OR 

VTM 

me 

MPJMPT  PPM 

CP0.Ml.il 

*f*lM_PITI 

R1.il.P1 

vni  pppT  LBwm 

Pi7rM.Pi3.Pi 

P1.TPPJMR.R1 

PI.  PI 

MOW 

MOW 

CPU. 

mil 

M.M 

PI, PI 

TMP  MPTPMC 

M.  *JMPT.PUM.Pr«M 

MOW 

oc 

MO 

MOW 

P0.P1 

mo 

M.CP1.M] 

PI. 1.P1 

M.CP1  .Ml 

JOPPT.PUMJWTUPP. 

PM  M 

PM  PI 

pm  n 

PM  M 

PM  IP 

PUPBP_XTMJD«I 


i  M  <■  » 

s  PC  <-  Offa«t  ta 
:  Pi  <-  MPT. 


i  Mlft  NT.HM  u 


riM)« 

ftold 


I  M  <•  MPT  MM  |  1 
:  OMt  M  into  an  MOP 
I  Mm  MPT  Btr  into  PI 


i  PWP  off 0*1  t  return  tn  M 
:  rf  10  not  in  tools,  return 


OPPT.I 


MlGRATE.OBJECT.XTRP  --  Takes  an  object  ID  and  sands  object  to  a  nods 

The  to  of  tha  oojaet  to  derate  Is  In  M,  and  tna  destination  node 

nutter  is  In  (1 .  If  tna  object  is  not  local,  a  MIGAATE.OBJECT.MSG 
■•stage  is  sent  to  tna  residence  of  tna  object. 


;  Runs  under: 

AO  absolute  nod*,  unchecked 

;  Input*: 

RO.  R1 

:  Trash**: 

R2.  RJ 

MIGRATE  OBJECT  XTRP: 

PUSH 

I 

; 

MOVE 

TRUE. RE 

I 

MOVE 

RE.I 

XLATE 

RO . RE . XIATE.IO_TO.NOOE 

PUSH 

RO 

CHECK 

RE.TAG.AOCR.RJ 

BT 

RJ ,  -MIGRATE.OBJECT  LOCAL 

Sava  old  l-Oissble  flag 
RE  <-  True 
Disable  interrupt* 

RE  <-  Address  of  ID  in  M 
Save  id 

Is  object  local? 

If  so,  at grate  It 

Send  residence  nod*  # 

MSG:  (MIGRATE  OBJECT  MSG«STS_LEN_BITS>I1 
RO  ;  Send  aessaga  header 

POP  ro  :  Restore  object  ID 

XNOa  R0.R1  ;  Sand  object  Id  4  nod*  * 

POT  I  ;  Raster*  interrupt* 

POP  IP  i  Return 

MIGRATE  .OBJECT  LOCAL: 

PURGE  Rd  ;  Reaove  binding  frea  cache 

MOVE  XCAU.  BRAT  PURGE. RJ  :  RJ  <-  Purge  He* 11  P 

CALL  TRAP  XCAU.  j  Purge  RO  frea  BRAT 

R2.SYS  LEN  MASX.RJ  ;  RS  <-  Length  of  object 

MSG :  SYS.UNC  I  ( It*XIGRATE_OGJECT_lttO«SVS  LEMJITB  ) 

R0.R1.R0  ‘ 


MIGRATE_OGJECT_fORUARO_MESSAGE 
SENO  R2 
DC 

SENO 


DC 

ADO 

AGO 

SENOZ 

POP 

SEND 

NONE 

SENO 

MOVE 

.OBJECT. 

MOVE 

SUB 

BZ 

SENO 

AOO 

BR 

.OBJECT. 

SENOE 

DC 

OR 

MOVE 


MIGRATE.OBJECT. 


MIGRATE. 


MIGRATE. 


RO.l.RO 

RI. RO 
RO 

RO 

NNR.RO 

RO 

0.R0 

.LOOP: 

RE.AE 

R3.1.R3 

RJ ,  "MIGAATE_OBJECT_LAST 
CAO.AIJ 

RO.l.RO 

-MIGRATE  _OBJECT_L00P 
.LAST: 

[RO.AE] 

TAG  OBJHEAO:  SYS_MARR_MASK 

Ro.Co.aei.ro 

R0.C0.AE1 

I 

IP 

.XTRP_ENO: 


Add  Tength  of  object 
Add  1  for  hdr,  10.  this 
Send  node  t,  header 
RO  <-  10 
Sand  10 

RO  <-  This  node  # 

Send  this  nee 
Current  Index  •  0 


Copy  abject  address  to  AZ 
Dec  meant  length 
If  length  •  0,  send  last 
Mail  out  object  xard 
Inc  meant  1 


Send  final  object  word 
RO  <-  Deletion  mark  aeak 
Mark  header  deleted 
Store  back  Into  l 
Restore  interrupts 
Return 


EXCEPTION  NANOLERS 


INVAOR.EXC  --  Exception  handler  for  access  of  an  Ax  register  with  I  bit  set 
Runs  under:  AO  absolute  node, unchecked 


INVAOR.EXC: 

PUSH  RO 

PUSH  R1 

PUSH  RE 

PUSH  RS 

MOVE  TRP.RJ 

DC  ITS  OPO  MASK 

AND  RJ,iO,A? 

DC  - ( SVS.OPO.BITS  ♦  2  ♦  2) 

LSH  R3.R0.R1 

EQUAL  R1  Z.RO 

BT  RO.-INVsWajIXC.RCG.ORIINTlD 

EQUAL  R1.1.R0 

BT  RO, -INVAOR.EXC  REG.ORIENTED 

INVA0R_EXC.N0RMAL.0P0 : 

MOVE  O.RS 

OC  S11 

ANO  RE.RO.RE 

BR  -INVAOR.EXC.REXLATE 


RJ  <•  faulting  instruction 
RO  <•  Mask  to  keep  OPO  field 
RE  <-  OPO  field 
RO  <-  Bits  to  shift  dewi 
Ri  <-  Opcode 
I*  opcode  E  (RCAOR)T 
If  sc,  treat  OPO  special 
I*  opcode  1  ( WRITER)? 

If  a*,  treat  OPO  special 

RS  <-  0  (eean*  eurr.  priority) 
Mask  to  keep  Ax  bits 
RE  <-  A  index 
Re- translate  IOx  •>  Ax 


!NMM_e<C_afe_OtIENTED: 

lom 

DC  *11 

mb  xz.ao.Mt 

JDXLATl: 


OMM  OK  I 


MOM 

on 


Ml.*. Ml 
M3.at.Ml 

luenoMjoeio.LOsccM*: 

noBi  ioo.mo 

-1NUSCC_«NC_W>1* 
ioi.no 

~INVSOM_IXC_XLSTI 

in. mo 

~IIMAOM_OC_XUTl 
IDO  MO 

•'INVSCM.IXC.XLATt 

in '.mo 

~iNMBt_Me_xunt 

101*. MO 

“HWSCM_PC_XUTI 

in*. mo 

*  IUMntMJDC_XL*Tf 

101'  mo 

on  *rw«*_r(c_*u<Tt 

IHUnQMJWC  XLATf: 

mXTI  S0,*1  ,XL»TS_LOCAL 


1  ),M3 


i  Ml  <•  Mslattve  priority 
i  MMk  to  MM  Ml  611* 

;  Mt  <•  S  fndsn 


i  Ml  <•  (MM) 


i  mo  <•  m 
s  Mrwcn  w  xunt 
;  HO  <•  101 

I  06  in  Ml  IM  MUTC 

i  mo  <•  m 

I  Dench  *M  XUW 

i  mo  <-  no 

;  MM  W  XLATt 
■  Mt  <•  no< 
i  omm  mm  xunt 

;  MO  <-  W 

i  omm  m  xunt 
i  ho  <-  no' 

I  MW*  an*  xunt 

;  Mt  <-  m* 

;  Bench  end  xunt 


I  Ml  <- 


,  XM,  ar  ML 


M«t  «•  r*J«ct  lan’t  Imp* I  If  XUTK  faults,  u*  don't  saw*  staafcsl 


j - ........ — ....................................... 

:  CSMLY_EXC  —  Excaptton  h*M1*r  far  Mrly  «u*u*  accaaa 

;  Nuns  under:  M  shadow 

;  Trashes:  TDM 


ISMLY.IXC: 


HOW 

SS.CTDM.SO] 

MOM 

MO 

NTSC 

LON 

MO.TSSIirT.MO 

ao.-t.Mo 

SUM 

ao.i.ao 

LM 

ao.o.MO 

NTSC 

MO.TSC_IM.HO 

MUON 

no 

MOW 

CTDM.SOJ.MO 

MOM 

IM 

CSMLY_fXC_DBi 


i  Sm  Mt  in  1DM 
I  MO  <-  Msturn  Sddress 
;  Cast  IMS  an  INT 

>  Ml  ft  HO  t*  LIS  It* 

i  Mack  up  addresafphsoa 
i  Mi  ft  storm  ft*io  task 
i  cast  Mask  ims  an  IM 
i  Put*  PBtapn  If  on  staSk 

>  Msmsps  MO 

;  Matpy  Instruction 


l  SOO.CNC  --  Inception  handler  far  sand  Suffer  seerfleu 


taua  Mt  in  TDM 
Mt  <-  Msturn  MSOrass 


;  Muna  under:  M  shadow 

j  Traahasi  TDM 


MNDJtXCi 

NOW 

MO.CTDM.Mt] 

MOM 

MO 

wtsc 

LM 

MO.TSC_INT.MO 

so.-o.So 

SUM 

ao.i.ao 

LSH 

MO.S.MO 

NTSC 

MS.TSC.lM.no 

MUCH 

no 

MOW 

CTDM.S03.S0 

MOM 

IM 

SDBJEXCJBB: 


;  Cast  Into  an  INT 
s  Oklft  MO  to  LMlt* 
i  Mask  up  eddraaa/phaa* 
i  Oklft  address  field  Mask 
;  Cast  tack  into  an  IM 
l  Nan  return  IM  an  stack 
(  Master*  Mt 
I  Matry  Instruction 


;  XUSTIJWC  —  Inception  handler  far  traMlatlon  fault 

j  Nuns  under:  M  Skoal uta  Ned*,  unchsektd 

s  Trsahasi  'DPI-* 


XlMTUDCi 

NOW  MO.fTBM.MO] 
NOW  Ml.  {TOM,  MO] 
MOW  Mf.flDM.SOJ 
HOW  Ml.fTBM.SOj 


s  tow  dou  real  stars  in 
t  TDM  -  TDM  far  uss 
I  as  an  arrap 


NTSC  MO.TSC_INT.MO 


MO  <-  Current  priority  TOM 


HOVC  OO.CTDMa.AO] 

LSM  M.-7.M 

ano  w.vi.no 

•00  43,  TOMO.no 

hovc  cro.aoj.ro 

wvc  no.m 

HOVC  XCAU._ORAT_XtATS,nj 

CALL  TOAP.XCAU. 

ONXL  no .  ~XLATE_EXC_NO_iINOINB 

ENTcn  m.no 


.xLATe_nrrnr: 


m 

nj.i.nj 

nj.o.u 

nj 

CTOMO.AOJ.no 

CTCMP1.A0J.ni 

CTOM2.AoJ.na 

CTEMP3.AOJ.n3 

IP 


!  TOM*  <•  Current  priority  TOP 
;  Picli  out  »re.  register  ftotO 

;  Aoo  TEMPO  **  atnrt  of  array 
;  Load  no  «tui  wurco  10 
;  Copy  ID  to  Hi 


1  Saa  If  10  10  1ft  (RAT 
s  If  n at,  Hanoi*  no  binding 

1  In tor  pair  in  eaeno 


:  *J  <•  Datum  IP 
;  Snift  IP  until  pnaaa  is  LOO 
:  Pack  up  on*  phase 
;  no  <-  Paiiao  mat.  IP 
1  Kit  ratry  IP  an  stack 

i  Hastora  data  ragiatara 


Ratry  Pal  lad  Instruction 


XLATE_EXC_N0_8IM>IN6  i 


CTEMP4.A0J.no  i  n 

no,  -<srs  opoMTS*sys_oPi_oiTS),n2 

(1  «  SYS_0P2_BITS)  -1  ;  n 

H2.nO.R2  i  « 

H2.XLATC  aOJ.RO  ;  M 

no ,  ~XLAT?  FXC  OOJ  MOOf  ;  I 

R2.XLAtCJO_t5nOM.RO  ;  M 

nO,*XLATf_EXC Jo  TO_NOCCJ«X  ;  I 

na.kLATE  WTH&r  JO  !  M 


no  <-  Pal  lad  Inatructlan 


no,»xiATf_exc_Hrr>a>_MOOC.ji*M  i  ip  so,  prancn 


no  <•  assa  to  kaap  op2  Plaid 
02  <•  XLATC  aoda  Pr m  apt 
war*  MB  in  XLATC.OOJ  nodal 
IP  as,  branch 

Mara  ua  In  XLATE_IQ_TO JdlOCT 
IP  as,  branch 

i  Vara  ms  in  XIATIJCTMOO  nodal 


X  LA  TE.EXC  LOCAL. 

m5« 

DC 


•mp.ni 

*1111111 

m.no.  u 

n2.1tMP0.nt 

NiL.no 

no.tm.Aoj 

CTDMO.AoJ.no 

CTDMI.AOJ.m 

CTDM2.AoJ.na 

CTDMJ.AOJ.na 

IP 


tas  Oast  bust  ba  a  data  raglatarl  aaa 
i  Ri  <-  Pal  lad  nunc 
t  no  <-  Mask  ta  kaap  Oaat  Plaid 
i  R2  <-  Oaat  Plaid  of  xlatc 
s  02  <-  TaapPCOsstJ 
i  M  <•  NIL 
i  TaapOCDaat]  <-  NIL 
i  Raster*  data  ragiatara 


X  LATI.CXC.OOJ.HOCC : 

CALL  TOAP_OIC 

XLATC_CXC,HCTHOO  HOOC.JUMP: 

*  xxLATf_excjemoo_Hooc 

XLATC_EXC.tO_TO_NOOe.HOOe  i 
Move  Tnp.ii 

lsm  ni,-7.m 

ANO  01. *11,01 

aoo  ni.TDMo.ni 

hove  c«i.AO].ni 

LSM  01 .  -sys.io.id  nrrs ,  RI 

ANO  RI ,STS_I0_N00f_HA3K,R1 

HOVC  TRP.R2 

DC  *1111111 

ANO  n2.R0.R2 

AOO  R2.TDM0.Rt 

HOVC  m.CRt.AOJ 

HOVC  CTCMPO.AOJ.no 

Move  CT©Mi.AoJ,ni 

HOVC  CTDM2.A0j.n2 

HOVC  CTOM3.AOJ.RJ 

POP  IP 


;  Return 


i  Just  dl*  Par  1 


RI  <-  Palled  XLATC 
Shift  Source  bits  daun 
Just  kaap  source  bits 
Ri  <-  TDM0  ♦  Ra 
RI  <•  Source  10 
Shift  Sirtnnod*  n ussier  daun 
Just  kaap  nods  nuabar  Plaid 
*2  <•  Palled  xlatc 
RO  <-  Mask  to  keep  Oaat  Plaid 
Rt  <•  Oast  Plaid  *P  XLATC 
R2  <*  TDMO  ♦  Oast  (Rh  anlyl ) 
TDM(0*at]  •  blrthnad*  nuabar 
•eater*  data  ragiatara 


;  Return 


XLATC.CXyjemOO^HOCC: 

LSM  03.-g.R3 
SUO  R3.1.R3 
lsm  nj,g,R3 


;  Shift  IP  until  phase  la  LSR 
1  Back  up  snp  phase 
i  RS  <•  Palled  mat.  IP 


:  Now  Ri  halda  aourca  10.  a  ratry  IP  1*  m  03 


XLATC.CXC.SAVC  MSS: 
PUSM  01 
PUSM  102 


;  Sava  au ay  RI 
s  Push  102  on  stack 


CD.A3J.02 


XLATE_£XC_S£AACH JC_  IP: 

M  ii.lu 


U.2.U 

A1.CA2.AOJ.AO 

AO ,  “XLATlfXC_fOUMO.HC.IO 

C«2.A0],M 

AO ,  “XLATE  EXC.MC  LOOM 

CTO#4,AOLAO 

AO  ,  “XLAT1.EXC.HC  LOOM 

A2,tTEMM*.A0) 


KLATf_eXC.MC.LOOM: 

IMI  AO  .  “XLATfEXC.SCAACH.HC.IO 

MOW  CTfMMt.AOlAO 

■•MIL  AO  ,  “XLATf.EXC_60T.A00H 

XLATE  fXC  fMTfA  IN  OVSPMLOV  LIST: 

m5v(  A1.CCONT_AUOUACf.AS] 

DC  VAA  HCACHC  OVEAPLOW.LIST 

HOVt  AO, AS 

MOVC  C AO, AO], AO 

MOVf  AO .CCONT.NCXT  CONTEXT, AS] 

MOVE  COAJCCT.IO.ASl.AO 

MOVC  AO, CAS. AO] 

M  “XLATC_CXC_MAIL_OAOn_«THOO 

XLATE.EXC_QOT.AOOH: 

WVt  CTD#4.A0].AS 

MMC  A1.CAS.A0] 

XLATE  EXC  POUND.MC  ID: 


AS.1.AS 
CAS, AO], AO 

coajeer _io.as3.as 

AS  CAS  AO] 

AO.CCONT_NCXT_GONTtXT.AS] 


Oecrsne nt  offset 
Over— ant  lanpth 
I*  tt's  tM  14  M  want? 

If  so.  add  context  to  list 

If  sntry  not  nil,  tooa  oooln 

If  TO#S  Is  non-nil,  loop 
Entry  is  nil.  so  fill 
TB*a  with  offset  to  tnis 
eapty  pises. 

If  1sn«tN  !•  0,  Isap 

If  TE>n«  net  ml.  ms  found  an 
sspty  specs  in  tits  table. 

Assourca  •  HsthsO  10 
AO  <-  Overflow  list  sOOr 
Copy  to  AS 

AO  <-  Car  of  overflow  list 
Next  context  *  rest  of  list 
AO  <•  Context -10 
Oflow  list  <•  Context- 10 
Msi  1  for  netted 


:  AS  <-  Capty  slat  offset 
i  fill  MC  10  «ritt  netted  10 

i  Point  offset  to  wait  list 
;  AO  <•  tear  welt-llat) 

;  AS  <-  Con text -10 
:  Paint  welt-list  to  cantoxt 
i  Point  child  slot  to  the 
:  rest  of  wait- I 1st  (er  nil) 


i  Now  we  have  set  up  the  welt  list  for  the  netted, 
i  We  novo  to  nail  off  s  netted  request  to  the  nonet  nisi 
i  node  of  the  netted  In  question  (10  In  A1). 


ATCJXCjjHAtL.OAOCTJCtHOO 


SCNOSC  PI.  A3 


Ai  i  Jove  10 

TAAM.IO.TO  NOOC  ;  AI  <-  Node  nutter  O 

Ai  ,Al  i  Neve  u  AS 

ill  s  Knun  XO 

MPO :  ( ICTMOO_ACOUCST.MSfi«STS_LCN_am  )  1 3 1  SYS.IPC 
A3, AO  i  Send  deet  node  *  A  i 

NNA.AJ  i  U  <-  This  node  nun 


XLATC_CXC_CNOi 


;  Send  asthod-IO  A  this  node  P 
;  wait  far  netted  reply 


EXC.VCCTOAS: 

OC 

oc 

OC 

oc 

oc 

OC 

oc 

oc 

oc 

oc 

oc 

oc 

oc 

DC 

oc 

oc 

oc 

DC 

oc 

oc 

DC 

oc 

oc 

DC 

DC 

OC 

oc 

oc 

oc 


IP:STS_A«SmHBO_PC«STS_LfX_»ITS) 

IP:  SY1_AAS  I  (  B#TO*ULT<<lrl„LW..»rrS  )  I  OALf AULT 
IP:STSJHB|(OrTV_PAULT«m_L«HjnS)  I  ILUNPT 
IP:  SYS  JUS  I  ( yK'nf_PAULT«SYS_UN_SITS  )  i  ILAAOASO 
IP :  SVS.AM  I  ( WPTY_fAULT«SYS_UX  SITS )  i  ACCUS 

IP :  SYS.AM  I  ( B*TY_PAULT«S?S_llN_AtTS  )  :  LOOT 

IP :  SYS  JIM  |  ( DTTY_FAULT«SYS_IJEH_»rrt )  i  INVAOA 

if:  sys jiao  i  ( DrTr_p AttT<<rr».Uf  jirrs )  »  mm 
ip . SYS  .AM i ( d*ty”fault<<sys.hniits )  :  OUfUf 
IP :  SYS  JIM  I  (  SfNOjpeC«SYS_LENZ»m ) 

IP:  SYS.UNC  I SYS  AMI  (XLATCjEXKOTS JM JITS) 

IP:STS_AM|<*im>AULT<<IrS_LINjrTS)  t  AAWE 

IMlSYSJIMI  (PUSH  JST«STS_Lf5jm) 

IP:STSJIM|(PCMBe«SYS_LINjlTS) 

ip  .  m jiM  i  ( D»fr_r ault<  ?sys_l*n_iiti  ) 

ip  :  sys jim  i  (ern^f ault«sys_  u*  *its  )  »  owamlow 

IPlSYJJIMI  (BrTT_PAULT<<SYt.:*S’«TS)  1  TYPf 
IP :  SYS  JIM  |  ( BTTY_FAULT«STS_LIN_arrS  )  1  IA 

IP :  SYS jm  |  (  •TTY^AUL  KCSYS.LfN.OITS )  i  IS 
IM :  SYS jm  I  ( DTTY.f AULT«  SYS.LfN_tnS )  :  IC 
IP:SYS_AMI(DeTY_FAULT«SYS_LEN_SITS)  j  10 
IP :  STS.AM  |  ( fKPTY.f AAJLT«STS.i».am )  !  IE 
IP :  SYS  .AM  |  ( orTY.f AULT«SYS_LEN_«rrS )  1  IP 
IP!SYS_AM|(O^TY_PAULT«SYS_U*_»ITS) 

IPltnjm  |  (0*TY_rAULT«SYS_UM_SITS  ) 

IP:SYS  _AMI(B*TY_FAULT«SYS.L£N_BIT3) 

IP:  SYS  AM  |  (£>TTY_FAUIT«SYS_L£N_«ITS ) 

IM:SYS  AM|  (EttTY_PAULT<<SYS_L€N_iITS  ) 


•#r^WULt<<lu_L«jfrs)  i 
push  ?3r«m_L«5_anl) 
pcm£c<  <sys_un Jrrs ) 
••TTf AULT<?SYSlL»_SrrS  ) 


DC 

oc 
oc 
oc 
oc 
oc 
oe 
oc 
oe 
oc 
oc 
oc 
oc 
oc 
oc 
oc 
oc 
oc 
oc 

Exc_veera««t.aoi 

xcAuoaeraMi 

DC 

OC 

oc 

OC 

oc 

oc 

OC 

oc 

oc 

oe 

oe 

oc 

oc 

oc 

oc 

oc 


ip!Sy3_ami  (orrv_MuiT«SYs_L£N..arrs) 
IPtSTSjUEKEMPTY  MtM.T«SYS_l£N_DIT») 

IP:  SYS  JIM  I  (gum  f*UtT«SW  LEN  «TS) 
IMt»j3ci«V»J«5| <!S,C0NT6jT  T«J<<««  tEJLHTS) 
IP:SY»_UNC|SY»_AM|  ( P*il  .ajNTWT.TOPW J ) 

ip:sy»j«mi  (xrMjo  t»<<sy|^  *—* 

iPtsnjw(it£njSLJ*r<<n~ 

IP:SYS_UNCI  SYS  JtS]  ( NEW_T*P«SY-_ 

IP:  SY^UMUmjMI  (MUiWwTMUl 
IP :  SYS  JIM  I  (  BWID.TTIP<  <SYS_C£N  SI 

iP<sr».uwci»vs.»aitT«>»!.>roL' 

IP:*Y*_UNCI*Y*JIMI  «MP_T»P<' 

ip.sy»_uncuys .jmin&lmii  — 

iP:$rs_*j«i(*iwv.nMP«jrs_LaLsmi 

IP,(Y*_AMI  ( B^TY,TWP«svCi;ttj»rr»} 

IPiST*  UNCISYOAMI  ( XC*A_ri{p<<SYt_lJ>C»I 
IP:  (OlI_TllP«5?I_l£N_SITS7 


r_W«S«.LEI**SIT* ) 


iJIMI  ( B»TY_«CAU.«lY»_LDC»mj 
IP:SYSJJNC  |  SYS. All  I  ( •MT_SPTf*_XT*p«»YS_La(. 
IP:  SYS JPCI  JY»_4M  l(m*T_UATt_XV**<*nj 
I P:  SYSJJNC 1  SYS  JIM  I  {  MPTPUMt_XTNP«*Y*J 


IP:5YS_ 


ncrjcrtprtmj 

SSSr^ 


IPiSVS.UNCISVSJIMI  (NUM1 
IP.SYSJIMISYSAMI  (MAT, ' 
IPtSrSJWKO#TY_»C»U.O  _ 
ip«sys_ami  (Mm_»cAa«m_i  .  _ 
iPiSYtjuai  (0rrv.M«a«m.UBCi 
lPtmjmt  i0*TrjtcMj.((sn_amju  «•# 

IPi*Y»_M»l(*N^T_)OU.«m_tiCST») 
IP:SVS_«MI  (Bm.»ON4.«in_L®i.Sim5 
IP:  SYS _*•»  I  (  OTTY JCNX«SV(_LEH.N  T»> 
IPiSYSJIMI  ( e*PTY_»CAU.<  <SY5_LBL»ITS.) 
IP:  SYS  JIM  I  (EWTV>CMJ.«SVt_lJN_MTS) 
IP:SYSJlMI<BrTY  XCPU.<<SYS_LM.iITS) 
OC  IP:  SYS  JIM  |  (  ENPTY_*CALL<  <SY»_LEN_«IT»  ) 

oc  iPtmjmHOrrrjcMX<<an_Mjin) 

OC  IP:  SYS  JIM 1  ( BriYl)(C*a.<aYS_toijilTS  ) 

XCALU.VCCTOM_ENOi 


NON  CtMURta 

nenjmmzMi 

•CP  MU: 
TVlfibU: 


oc 

OC 

OC 


INT:(A0M  END  - 

o.o.o.o.j.o.o. 


1024) 


MM.ENO: 


Appendix  C 


Operating  System  Quick 
Reference 


HEAD 


FMutw  block  of  mwwn 


CALL 


SEND 


REPLY 


NEWJ«THOD 


HfcfMCMa 


■  tha.  specified 


Hi, 

**ID<CQ«a*tr4d>w  ff  the 
°om»xi  wae  waitim  Hr  this  slot, 
itwiflbenataMML 


tor  Anew 


(4n)  (dam)  (id)  (aafecks)  <<*»)* 


AttwMeieewof^offypi 


Mbvh  Re  object  win  s> 
<**jact-ite  to  rtodtMMta 


^•0®ATE_OHBCT 


System  Calls 
Has 

XCALL 

SWEEP 

NEW_CONTEXT 

NEW 

ID_TO_NODE 

MALLOC 

FREE.CONIEXT 

FREE_SPEClPlED. 

GEMD 

VERSION 

XFERJD 
XFER.ADDR 
BRAT JPEEK 


Aryumgnfti 

Xcall  routine  number  iaRS 

Size  of  user  space  in  R0 

Sire  of  object  in  RO 
Clue  of  object  in  R1 

Object  ID  mRl 

Block  sue  in  RO 

Context  n>  to  bee  in  1D1 
CONTEXT 

Context  ID  id  free  in  Rt 


Context  ID  to  man  in  RO 

Context  eddna  in  A1 

ID  to  hack  in  Rt 
ID  to  search  for  in  Rl 
Bmo  of  BRAT  table  in  A2 


Description 

Calls  one  of  the  routines  defined  in 
the  extended  call  vector  table.  This 
was  implemented  since  the  CALL 
vector  table  was  running  out  of  room. 

Compacts  the  heap. 

This  routine  rreama  a  new  context 
object  with  Rt  wants  of  user  space 
and  returns  the  context  address  in  AI 
and  A2.  RO  is  trashed. 

Creates  a  new  object  of  size  RO  ar.d 
class  Rl,  and  teams  die  object's  ID 
inRO.  Rl  gets  trashed. 

Returns  a  likely  node  for  the  object 
with  ID  Rl  to  be  on  in  Rl. 

Allocates  Rt  words  of  physical 
memory  and  returns  the  address  in 
A 2. 

Bees  the  context  with  ID  in  ID1, 
possibly  placing  it  on  the  context 
free  list. 

Frees  the  context  with  ID  in  RO, 
possibly  placing  it  on  the  context 
fine  list  This  trashes  RO  snd  Rl. 

Generates  a  new  ID,  and  returns  the 
ID  inRO. 

Returns  the  OS  version  number  in 
RO.  where  the  high  16  bits  hold  the 
major  value,  and  the  low  16  bits  the 
minor  value. 

Tnnsfus  control  to  the  context  whose 
ID  is  in  RO.  This  never  reams. 

Transfers  conaoi  K>  the  context  whose 
ID  is  in  Ai.  This  never  returns. 

Hashes  the  ID  in  RO  to  find  s  first 
slot  in  the  BRAT  to  search.  A  linear 
search  proceeds  from  there  until  the  ID 
in  Rl  is  fbtmd.  When  found,  the  offset 
from  the  start  of  the  BRAT  where  this 
enay  is  located  is  reamed.  If  not 
found,  NIL  is  reamed. 


BRMLXLAra 


BRAT.PWWB 

MKftAnLQUBCT 


iPtoMtaputBRATa* 

ID«|WpfrQmBitATi«J)f 

I'M* »  m  ^ 


&assss3$ft 


I 
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